GRAPHONS, CUT NORM AND DISTANCE, COUPLINGS 
AND REARRANGEMENTS 



SVANTE JANSON 

Abstract. We give a survey of basic results on the cut norm and cut 
metric for graphons (and sometimes more general kernels), with empha- 
sis on the equivalence problem. The main results are not new, but we 
add various technical complements, and a new proof of the uniqueness 
theorem by Borgs, Chayes and Lovasz. We allow graphons on general 
probability spaces whenever possible. We also give some new results for 
{0,l}-valued graphons and for pure graphons. 



1. Introduction 

In the recent theory of graph limits, introduced by Lovasz and Szegedy 
and further developed by e.g. Borgs, Chayes, Lovasz, Sos and Veszter- 
gombi 14 , ]j| , a prominent role is played by graphons. These are symmetric 



measurable functions W : Q, 2 — > [0, 1] , where, in general, Q is an arbitrary 
probability space. The basic fact is that every graph limit can be repre- 
sented by a graphon (where we further may choose 0, = [0, 1] if we like); 
however, such rep resentations of graph limits are far from unique, see e.g., 
l3) 13l. H. I24U48I], (This representation is essentially equivalent to the repre- 



sentation by Aldous and Hoover of exchangeable arrays of random variables, 
see [43| for details of this representation and 0,H3] for the connection, which 
is summarized in Appendix |Dj) See Appendix [B] for a very brief summary. 

It turns out that for studying both convergence and equivalence of gra- 
phons, a key tool is the cut metric [13]. The purpose of this paper is to give 
a survey over basic, and often elementary, facts on the cut norm and cut 
metric. Most results in this paper are not new, even when we do not give a 



specific reference. (Most results are in at least one of [12|, [13|, [14J, [24j, |4£ 
However, the results are sometimes difficult to find in the literature, since 
they are spread out over several papers, with somewhat different versions 
of the definitions and assumptions; moreover, some elementary results have 
only been given implicitly and without proof before. Hence we try to collect 
the results and proofs here, and state them in as general forms as we find 
convenient. For example, we allow general probability spaces whenever pos- 
sible. We thus add various technical complements to previous results. We 
also give some new results, including some results on {0, l}-valued graphons 
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in Section [TUl and some results on pure graphons leading to a new proof of 
the uniqueness theorem by Borgs, Chayes and Lovasz [13J in Section [9l 

We include below for convenience some standard facts from measure 
theory, sometimes repeating standard ar gum ents. Some general references 
(from different points of view) are 



Remark 1.1. The basic idea of graph limits has been generalized to limits 
of many other finite combinatorial objects such as weighted graphs, directed 
graphs, multigraphs, bipartite graphs, hypergraphs, posets and permuta- 



tions, see for example [4|, QJ, QJ, l2J, \M, \M, 140, |4J, |5l|, [52| • Many results 



below extend in a straightforward way to such extensions, but for simplicity 
we leave such extensions to the reader and concentrate on the standard case. 

2. The setting 

Let {Vt,F,n) be a probability space. (We will usually denote this space 
simply by Vt or (f2, //), with J- and perhaps fi being clear from the context.) 
Often we take Q to be [0, 1] (or (0, 1]) with [i = A, the Lebesgue measure; 
this is sometimes convenient, and it is often possible to reduce to this case; 
in fact, in several papers on graph limits only this case is considered for 
convenience. (See {381] for a general representation theorem.) However, it is 
also often convenient to consider other Q, and we will here be general and 
allow arbitrary probability spaces. 

Nevertheless, we will often consider [0,1] or (0,1]. Except when we ex- 
plicitly say otherwise, we will always assume that these spaces are equipped 
with the Borel cr-field B and the Lebesgue measure, which we denote by A. 
(We denote the Lebesgue u-field by £; we will occasionally use it instead of 
£>, but not without saying so. Recall that C is the completion of £>, see e.g. 

00 

Remark 2.1. Our default use of B is important when we consider mappings 
into [0, 1], but for functions defined on [0, 1] or [0, l] 2 , it often does not matter 
whether we use B or C, since every /^-measurable function is a.e. equal to a 
^-measurable one. In fact, it is sometimes more convenient to use C. 

In a few cases, we will need some technical assumptions on Q. We refer to 
Appendix [A] for the definitions of atomless, Borel and Lebesgue probability 
spaces. 

We will study functions on $7 2 , and various (semi)metrics on such func- 
tions. Of course, Q 2 is itself a probability space, equipped with the product 
measure fi 2 := /j, x [i and the product cr-field (or its completion; this makes 
no difference for our purposes). 

Remark 2.2. The definitions and many results can be extended to functions 
of Q r for arbitrary r > 2, which is the setting for hypergraph limits; see e.g. 
[HI and 

All subsets and all functions on Q, or Q 2 that we consider will tacitly 
be assumed to be measurable. We will usually identify functions that are 



GRAPHONS, CUT NORM AND DISTANCE 



3 



a.e. equal. This also means that functions only have to be defined a.e. 
(In particular, this means that it does not make any significant difference 
if we replace T by its completion; for example, on [0,1] and [0, l] 2 , with 
Lebesgue measure, it does not matter whether we consider Borel or Lebesgue 
measurable functions, cf. Remark 12. 1L Moreover, in this case it does not 
matter whether we take [0, 1], (0, 1] or (0, 1).) 

The natural domain of definition for the various metrics we consider is 
L 1 (Q 2 ), but we are really mainly interested in some subclasses. 

Definition 2.3. A kernel on S7 is an integrable, symmetric function W : 
tt 2 -> [0,oo). 

A standard kernel or graphon on SI is a (measurable) symmetric function 

We let W = W(tt) denote the set of all graphons on a given O. 

We are mainly interested in the graphons (standard kernels), since they 
correspond to graph limits. We use kernels when we find it more natural to 
state results in this generality, but we will often consider just graphons for 
convenience, leaving possible extensions to the reader. 

Warning. The terminology varies between different authors and papers. 
Kernel and graphon are used more or less interchangeably, with somewhat 
different definitions in different papers. (This includes my own papers, where 
again there is no consistency.) Apart from the two cases in the definition 
above, one sometimes considers the intermediate case of arbitrary bounded 
symmetric functions f2 2 — > [0,oo). Moreover, sometimes one considers W 
with arbitrary values in R, and not just W > 0; for simplicity, we will not 
consider this case here. (Extensions to these cases are typically straight- 
forward when they are possible.) 

Remark 2.4. For consistency we here require W to be measurable for the 
product (T-field J- x J 7 , but it makes no essential difference if we only require 
W to be measurable for the completion of T x J 7 , since every kernel of the 
latter type is a.e. equal to an T x ^-measurable kernel. 

Remark 2.5. A kernel is said to be Borel if it is defined on a Borel space, 
and Lebesguian if it is defined on a Lebesgue space, see Appendix [A] for 
definitions. We sometimes have to restrict to such special kernels (which 
include all common examples). Note that the difference between Borel and 
Lebesguian kernels is very minor: A Lebesgue probability space is the same 
as the completion of a Borel probability space. Hence, if W is a Borel kernel 
defined on some (Borel) space (Q, J 7 , fi), then W can also be regarded as a 
Lebesguian kernel defined on (f2, J 7 , /i), where J- is the completion of T (for 
fi). Conversely, if W is a Lesbeguian kernel defined on (fi,-? 7 , (J,), then T 
is the completion of a sub-cr-field Tq such that (f2, .Fo>m) is a Borel space. 
Hence W is a.e. equal to some ~Fq x J^-measurable function Wo, which we 
may be assume to be symmetric and with values in [0,1]; thus W = Wq 
a.e. where Wq is a Borel kernel. Consequently, up to a.e. equivalence, the 
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classes of Borel and Lebesgue kernels are the same, and it is a matter of 
taste which version we choose when we introduce one of these restrictions. 
Cf. Remark EU 

Remark 2.6. The definitions and results can be extended to the non-sym- 
metric case, considering instead of W(0) the set of arbitrary (measurable) 
functions Q 2 — > [0,1] or, more generally, Q\ x Q2 — > [0,1]- Such functions 
(bigraphons) appear in the graph limit theory for bipartite graphs, see e.g. 



2J] and [51(. 



Example 2.7. Let G be a (simple, undirected) graph. Then G defines 
naturally a graphon Wq, which forms a link between graphs and graphons 
and is central in the graph limit theory, see e.g. In fact, there are two 
natural versions, which we denote by Wq and W G . 

For the first version, we regard the vertex set V of G as a probability space 
with each vertex having equal probability 1/|G|. We define the graphon 
Wq : V 2 — > [0, 1] on this probability space by 

ttA/ , . Il if u and v are adjacent, 
Wq{u,v) = { (2.1) 
I otherwise. 

In other words, Wq equals (up to notation) the adjacency matrix of G. 

For the second version we choose the probability space f2 = (0,1]. Let 
n := \G\ and partition (0, 1] into n intervals Ij n := (^r-, ^]- We assume that 
the vertices of G are labelled 1, . . . , n (or, equivalently, that V = {1, . . . , n}), 
and define 

W G {x,y):=W G '{i,j) if x € I in , V € I jn . (2.2) 

The graphons Wq and W G are equivalent in the sense defined below, see 
Example 16.81 Usually it does not matter which version we choose, and we 
let Wg denote any of them when the choice is irrelevant. 

3. Step functions 

Recall that a function / on f2 is simple or a step function if there is a finite 
partition £1 = (JILi °f ^ such that / is constant on each Aj. Similarly, we 
say that a function W on J7 2 is a step function if there is a finite partition 
£1 = U™ =1 Ai of 0, such that W is constant on each A{ x A,-. Step functions 
are also said to be of finite type. If W is a kernel or graphon that also is a 
step function, we call it a step kernel or step graphon. 

When necessary, we may be more specific and say, for example, that W 
is a V-step function, where V is the partition {Ai} above, or an n-step 
function, when the number of parts A{ is (at most) n. 

Step kernels (and graphons) are important mainly as a technical tool, see 
several proofs below. However, they can also be studied for their own sake; 
see Lovasz and Sos j47| . which can be seen as a study of step graphons, 
although the results are stated in terms of the corresponding graph limits 
and convergent sequences of graphs. 
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Remark 3.1. Note that being a step function on Q 2 is stronger than being 
a simple function on that space, which means constant on the sets of some 
arbitrary partition of Q 2 ; it is important that we use product sets in the 
definition of a step function on Q 2 . See also Example 15.31 below. 

Warning. Some authors use different terminology. For example, when 
studying functions on [0,1], step functions are sometimes defined as func- 
tions constant on some finite set of intervals partitioning [0, 1], i.e., the parts 
Ai are required to be intervals. We make no such assumption. 

4. The cut norm 
For functions in L 1 (f2 2 ) we have the usual L 1 norm 



\\W\\i := / \W\dn z (4.1) 

and the corresponding metric \\Wi — WjjHl- 

For the graph limit theory, it turns out that another norm is more im- 
portant. This is the cut norm ||W||n of W, which was introduced for a 
different purpose by Frieze and Kannan (2^1, and given a central role in the 
graph limit theory by Borgs, Chayes, Lovasz, Sos and Vesztergombi [ill ]. 
(Its history actually goes back much further. For functions on [0, l] 2 , the 
version in (|4.3|) is the same as the Frechet variation of the corresponding 
distribution function F[x,y) := f£ $ W, see Frechet [28[]; more generally, 
||W||n 2 equals the Frechet variation of the bimeasure on Q 2 corresponding 
to W. See further e.g. Littlewood ]46lj (where also the discrete version is 



considered), Clarkson and Adams [18] and Morse 53], and in particular 
Blei [7S] with further references.) 

There are several versions of the cut norm, equivalent within constant 



factors. Following 29] and [14j|, for W € L 1 ^ 2 ) we define 



|W||n,i := sup 

S,T 



W{x,y)Mx)My) , (4-2) 

SxT 



where the supremum is taken over all pairs of measurable subsets of f2. 
Alternatively, one can take 



|W||n,2 := sup 

ll/Hoo,|M|oo<l 



W(x,y)f(x)g(y)dn(x)d»(y) , (4.3) 

n 2 



taking the supremum over all (real-valued) functions / and g with values 
in [—1,1]. (We let ||/||oo denote the norm in L°° of /, i.e., the essential 
supremum of [/[.) It is easily seen that in taking the supremum in (|4.3p 
one can restrict to functions / and g taking only the values ±1. Note that 
(|4.2p is equivalent to (j4.3l) with the supremum taken over only / and g with 
values in {0, 1} (i.e., indicator functions); it follows that 

H^lki < \\W\\ D ,2 <4||W|| Dj i. (4.4) 
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Thus the two norms || • ||n,i arid || • ||n,2 are equivalent, and it will almost 
never matter which one we use. We shall write || • ||n for either norm, when 
the choice of definition does not matter. For further, equivalent, versions of 
the cut norm, see Appendix [El 

We usually do not indicate or /i explicitly in the notation; when nec- 
essary we may add them as subscripts and write, for example, || • ||n,n,^ or 
II • [|rj,n,ft,l- 

Remark 4.1. Similarly, it is easily seen that fj4.2j) is equivalent to (|4.3p 
with the supremum taken over only / and g with values in [0, 1]. 

One advantage of the version || • ||rj2 is the simple "Banach module" 
property: For any bounded functions h and k on J7, 

\\h(x)k(y)W(x,y)\\ n , 2 < ||/»||oQ||fc||oa||W|b,a. (4.5) 
A similar advantage is seen in Lemma [4 . 5 1 below . (In both cases, using || • 
would introduce some constants.) On the other hand, || • ||n \ is perhaps more 
natural, and probably more familiar, in combinatorics. 
Note that for either definition of the cut norm we have 



W 



< \\W\\ n < ||W||i. (4.6) 



Remark 4.2. The definition (|4.3p is natural for a functional analyst. This 
norm is the dual of the projective tensor product norm in L 0O (J7)(8)L 0O (i7), 
and is thus the injective tensor product norm in L 1 (J7)(8'L 1 (i7); equivalently, 
it is equal to the operator norm of the corresponding integral operator 
L°°(Q) — > L x (r2). This contrasts nicely to the L 1 norm on Q 2 , which is 
the projective tensor product norm in L 1 (U)®!, 1 (0.) . (See e.g. [58J.) 

Remark 4.3. We may similarly define the cut norm of functions defined 
on a product of two different spaces. 

Remark 4.4. The one-dimensional version of the cut norm coincides with 
the L 1 norm. This is exact for || • ||n 5 2 : If / is any integrable function of Q, 
then 

ll/lli = sup f f(x)g(x)d fi (x) . (4.7) 

||sl|oo<l JO 

For the one-dimensional version of || • ||n i, we may in analogy with (|4.4p 
lose a factor 2; we omit the details. 

We define the marginals of a function W S L 1 (0 2 ) by 

W {l \x) := [ W(x,y)dn(y), (4.8) 
Jn 

W {2) (y) := f W(x,y)dfi(x). (4.9) 
Jn 

It a well-known consequence of Fubini's theorem that ||W^ ||x, x (Sl) — 
||VF||ii(n 2 ) f° r any W G L 1 (f2 2 ). This extends to the cut norm on f2 2 , 
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even though this norm is weaker. This is stated in the next lemma, which 
can be seen as a consequence of Remark 14.41 and the fact taking marginals 
(in any product, and in any dimension) does not increase the cut norm. 

Lemma 4.5. If W € L l (Vl 2 ), then \\W {1] || L i (n) , ||W (2) || L i (n) < ||W|| n>2 . 

Proof. By symmetry, it suffices to consider If / € L°°(Q), then 

/ W {l \x)f{x)Mx) = f W(x,y)f{x)Mx)My) 
Jn Jn 2 

and the result follows from (14. 3p . letting g(y) = 1 and taking the supremum 
over all / with ||/||oo < 1, using (|4.7p . (Or simply taking /(x) equal to the 

sign of W {1 \x).) □ 

Remark 4.6. It is a standard fact that the step functions are dense in 
L 1 (0) and L 1 (J7 2 ). As a consequence, they are dense also in the cut norm 
in these spaces. 

We finally note that the cut norm really is a norm if we, as usual, identify 
functions that are equal a.e. 

Lemma 4.7. IfW € L l (Q?), then \\W\\ U = W = a.e. 

Proof. Suppose that ||W||n = 0. Thus f SxT W(x, y) = for all subsets 
5, T C O. It follows that f^ 2 W(x,y)f(x,y) = for every step function / 

on n 2 . 

Let g be any function on J7 2 with ||g||oo < 1- Since step functions are 
dense in L 1 (0 2 ), there exists a sequence g n of step functions such that 
9n 9 in ^ X (^ 2 ); by considering a subsequence we may further assume 
that g n —7- g a.e., and by truncating each g n at ±1 that \g n \ < 1. By 
dominated convergence, VK^n — > f n2 Wg, but each f n2 Wg n = since g n 
is a step function; hence J* Q2 = 0. If we choose g := sgn(VF), this shows 
that J n2 \W\ = 0, and thus W = a.e. □ 

5. Pull-backs and rearrangements 

Let (fii, ^i,jt*i) and (^2,^2,^2) be two probability spaces. 

A mapping ip : f2i — ?■ fi 2 is measure-preserving if it is measurable and 
yUi((/? -1 (^4)) = 112(A) for every J2 (i.e., for every measurable A C fi 2 )- 

A mapping c/? : — > f2 2 is a measure-preserving bijection if is a bi- 
jection of f2i onto f2 2 , and both p and I/? -1 are measure-preserving. (In 
other words, ip is an isomorphism between the measure spaces fix) 
and (fi 2 , .F2, /z 2 ) in category theory sense.) Equivalently, cp is a measure- 
preserving bijection if and only if it is a bijection that is measure-preserving, 
and further <p~ 1 is measurable (and then automatically measure-preserving). 
Note that if f^i and f2 2 are Borel spaces, then measurability of p~ l is au- 
tomatic by Theorem IA.61 so it suffices to check that ip is a bijection and 
measure-preserving . 
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Note that if tp : Q\ — > CI2 is a measurable mapping, then p (g) p : fif — )• ^2 
defined by </? (8) c/?(x, y) = (y(£c), <^(y)) is a measurable mapping, and if p is 
measure-preserving or a measure-preserving bijection, then so is p ® 99. 

We define, for any functions / on and W on 0|, the pull-backs 

r(x) := f(<p(x)), (5.1) 
W^(x,y) :=^(x),^(y)); (5.2) 

these are functions on Q± and f^, respectively. 

We will only consider measure-preserving ip. In the special case that (p is 
a measure-preserving bijection, we say that f v and are rearrangements 
of / and P^. (However, we will not assume that <p is injective or bijective 
unless we say so explicitly.) We further say that W is an a.e. rearrangement 
of W if W = W v a.e. where W v is a rearrangement of W. Note that the 
relation U W\ is a rearrangement of W2" is symmetric and, moreover, an 
equivalence relation, and similarly for a.e. rearrangements. 

Remark 5.1. Note that if W is symmetric, then is too by f)5 . 2|) ; recall 
that this is the case we really are interested in. 

If we want to study general W, for example in connection with bipartite 
graphs as mentioned in Remark 12. 61 it is often more natural to allow different 
maps (pi and P2 acting on the two coordinates. 

Remark 5.2. Instead of measure-preserving bijections, it may be conve- 
nient to consider measure-preserving almost bijections, which are mappings 
p that are measure-preserving bijections fli \ N\ — > 0,2 \ N2 for some null 
sets N\ and N%- This makes essentially no difference below, and we leave 



the details to the reader. (See Theorem IHl: lVii) for a situation where almost 
bijections occur.) 

Example 5.3. A kernel is a step kernel if and only if it is a pull-back 
W v of some kernel defined on a finite probability space. (The same holds 
for general functions Q 2 — > R. Recall that step functions are the same as 
functions of finite type.) 

Remark 5.4. We take here the point of view that fj,i) and (f^j M2) are 
given probability spaces, and we consider suitable maps between them. A 
closely related idea is to take a probability space (fix, /ii) and a measurable 
space (without any particular measure). A measurable map 99 : — ?• 
f^2 then maps the measure [i\ to a measure on Q2 given by fJ-f(A) := 
l±i(<p~ l {A)) for all iCS] 2 . Note that [if is the unique measure on Q2 that 
makes <p measure-preserving. This well-known construction (called push- 
forward) can be seen as a dual to the pull-back above; note that measures 
map forward, from J7i to ^2, while functions map backward, from Q2 to Q,\. 

Note that, on the contrary, given a measurable map p : 17 1 — > VL2 between 
two measurable spaces, and a probability measure [12 on Q2, there is in 
general no measure /ii on Q,\ which makes p measure-preserving. This is a 
source of some of the technical difficulties in the theory. 
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It is easy to see that the norms defined above are invariant under re- 
arrangements, and more generally under pull-backs by measure-preserving 
maps: 

Lemma 5.5. If ip is measure-preserving, then, taking the norms in the 
respective spaces, for any f G L 1 (J7) and W G L l (Q 2 ), 

ll/li = ll/lli, Wh = Wh> (5-3) 

HW^Hn = ||W|| D . (5.4) 

Proof. The equalities (|5,3p are standard. 

The cut norm equality (|5.4p is obvious if <p is a measurable bijection. In 
general, it seems simplest to first assume that W is a step function, so that 
W is constant on each Ai x Aj for some partition Q2 = Ui -^i> sa y W = u>jj 
on j4j x Aj. Then := (/j _1 (^4j) defines a partition of Q\, and W^ is a step 
function constant on each A\ x A'p and equal to Wij there. 

Consider first ||W||n,2- I n the definition (|4.3p . we may replace / by its 
average on each Ai (i.e., by its conditional expectation given the partition) 
without changing the integral, and similarly for g. This shows that it is 
enough to consider / and g that are constant on each Ai, and we find 

[|W||d,2 = supl^WijOibj^iA^&iAj) , (5.5) 
i,3 

taking the supremum over all real numbers a, and bj with |aj|, \ bj\ < 1. Since 
Hi(A'A = fj,2(Ai), the same argument shows that HW^Hn^ is given by the 
same quantity, and thus (I5.4p holds in this case. 

For || • ||n,i we argue for step functions in exactly the same way, using 
Remark l4.1l and taking fflj,6j G [0,1] in (15. 5p . 

For a general W, let e > and let PFi be a step function on J7| such that 
\\W- Wi||i < e. Then 

Wi|| n < \\W- Wi||i < e. 

Further, ||Wf ||n = ||Wl||o by what we just have shown, and 

||W V - Wf || n < ||W V - Wf ||i = ||(W - Wi)1i = ||W - Wi||i < e. 

The result HW^Hn = ||W||n follows by some applications of the triangle 
inequality. □ 

However, the distances ||Wi — W2II1 and ||Wi — WjUn between two kernels 
are, in general, not invariant under rearrangements of just one of the kernels, 
since, in general, ||W — W^W 7^ for a kernel W on a space O and a measure- 
preserving bijection ip : f2 — > £1. In the graph limit theory, we need a metric 
space where all rearrangements are equivalent (and thus have distance to 
each other); we obtain this by taking the infimum over rearrangements. 

Given two kernels Wl, W2 on [0,1], the cut metric of Borgs, Chayes, 
Lovasz, Sos and Vesztergombi [13] may be defined by 

<fa(Wi, W 2 ) = inf ||Wi - Wf\\n, (5.6) 
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taking the infimum over all measure-preserving bijection <p : [0, 1] —> [0, 1]; 
in other words, over all rearrangements W% of W 2 . (If we wish to specify 
which version of the cut norm is involved, we write <5n 1 or <$rj,2-) Borgs, 
Chayes, Lovasz, Sos and Vesztergombi 14[ showed that for kernels on [0, 1], 
there are several equivalent definitions of 6\j, see Theorem 16.91 below. For 
general probability spaces fi, we have to use couplings between different 
kernels instead of rearrangements, see the following section; it then further 
is irrelevant whether the kernels are defined on the same probability space 
or not. 

On the other hand, if we restrict ourselves to [0, 1] , we can do with a 
special simple case of rearrangements. Following Borgs, Chayes, Lovasz, 
Sos and Vesztergombi [13], we define an n-step interval permutation to be 
the map a defined for a permutation a of {1, . . . , n} by taking the partition 
(0, 1] = U hn with Jj n := ((*— l)/n, i/n] and mapping each Jj n by translation 
to I a fi\ n - (For completeness we also let <x(0) = 0.) Evidently, a is a measure- 
preserving bijection [0, 1] — > [0, 1]. We shall see in Theorem 16.91 below that 
it suffices to use such interval permutations in (|5.6p . 



Example 5.6. To see one problem caused by using (|5.6p for kernels on 
a general probability space, let Q be the two-point space {1,2}, and let 
= i — e, [J>{2} = i + e, for some small e > 0. Let W\(x,y) := 
l{x = y = 1} and W2(x,y) := l{x = y = 2}. On this probability space 
there is no measure-preserving bijection except the identity, so (15. 6p yields 
\\W\ — W 2 \\d = {\ +s) 2 > \-> while the coupling definition (|6.1j) below yields 
S n (Wi,W 2 ) = 2e. 



6. Couplings and the cut metric 

Given two probability spaces (^2,^2), a coupling of these spaces 

is a pair of measure preserving maps ipi : O — > Oj, i = 1, 2, defined on a 
common (but arbitrary) probability space /x). 

Remark 6.1. Couplings are more common in the context of two random 
variables, say X\ and These are often real- valued, but may more gen- 
erally take values in any measurable spaces Vl\ and VI2. A coupling of X± 
and X 2 then is a pair (X^X^) of random variables defined on a common 

probability space such that X[ = X\ and X' 2 = X 2 - This is the same as a 
coupling of the two probability spaces (Qi, fj,±) and (O2, M2) according to our 
definition above, where is the distribution of X\ and [12 the distribution 
of X 2 . 

The general definition of the cut metric, for kernels defined on arbitrary 
probability spaces (possibly different ones), is as follows. 

Given kernels Wi on S~2 j , i = 1,2, or more generally any functions Wi £E 
L 1 (r2?), we define the cut metric by 

SaQVi, W 2 ) = inf WWf 1 - ||o, (6.1) 
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where the infimum is taken over all couplings (^i, <p 2 ) : O — > £l 2 ) of fii 
and (with $7 arbitrary), and Wf l is the pull-back defined in (15. 2p . 
We similarly define 

*i(Wi,W2) =inf HWf 1 - lUi^), (6.2) 

again taking the infimum over all couplings (c^i, (fz) of £li and ^2- 

Remark 6.2. It is not obvious that the definition (|6.ip agrees with (|5.6p 
for kernels on [0, 1], but, as shown in [14], this is the case; see Theorem 16.91 
below. Note, in somewhat greater generality, that if W\ and W2 are kernels 
of probability spaces Vt\ and 0,%, and cp : 0,± — > O2 is measure-preserving, 
then (i, y?) is a coupling defined on (We let here and below 1 denote the 
identity map in any space.) Hence, we always have fa(Wi,W2) < ||Wi — 

Note that <fcj and 5\ really are pseudometrics rather than metrics, since 
8a(Wi,W2) = and 8x(W\ 1 W2) = in many cases with Wi / W2, for 
example if W\ = for a measure preserving ip (use the coupling (i, ip) 
in (|6.ip . see Remark 16. 2p . Nevertheless, it is customary to call this pseu- 
dometric the cut metric. We will return to the important problem of when 
<fa(Wi, W 2 ) = in Section [3 

It is obvious from the definition (|6.ip that Sq and <5i are non-negative and 
symmetric, and 5n(W, W) = 8\{W,W) = for every W. It is less obvious 
that they really are subadditive, i.e., that the triangle inequality holds, so 
we give a detailed proof in Lemma 16.51 below. 

A coupling (<pi,<f2) of two probability spaces (f2i,/xi) and (^2,^2), with 
Pi,(p2 defined on (0,/x), defines a map := (pi,ip 2 ) : ^ Sli x £l 2 , which 
induces a unique measure pi on Oi x £l 2 such that <I> : (0, fi) — > (Ox x £l 2 ,Jl) 
is measure-preserving (see Remark 15. 4p . Let 7Tj : Q\ x O2 — > Oj be the 
projection; then ^ = 7Tj o $>, i = 1, 2. Note that if A C then 

M^r 1 ^)) =/i(*" 1 (vrri(A))) =/ z(^(^)) =^(A), 

since $ and ^ are measure-preserving; thus 7Tj : (Oi x ^2,/i) — > is 
measure-preserving. Hence, (vri,^) is a coupling of (f2i,/xi) and (f^j/-^)- 
If Wi £ L x (0?), then 1 = (W-^)* and thus, using 

|| Wf 1 - W^Hn = || (VF^ 1 - WJ 2 )*^ = || Wf 1 - W^lb (6.3) 

Consequently, in (16. ip it suffices to consider couplings of the type (711,^2) 
defined on (Oi x 02,ju), where is a probability measure such that tt\ and 
7T2 are measure-preserving, i.e., such that ju has the correct marginals pii and 

Before proving the triangle inequality, we prove a technical lemma and a 
partial result. 

Lemma 6.3. Let Qi and O2 be probability spaces and let W\ £ L 1 (0^) and 
W2 £ L 1 ^!) be step functions with corresponding partitions fii = Ui=i 
cm<i O2 = Uj = i ^/ (Vi) V2) {'p'xi <P 2 ) are t wo couplings of Oi and Q 2 , 



12 



SVANTE JANSON 



defined on (ft, /i) and //) respectively, such that fi(<p 1 1 (Ai)r\ip 2 = 
fi'(ip' 1 - l (A l )r\ip' 2 - 1 {B j )) for every i and j, then ||Wf 1 - Wp\\ a>ft = {{wf 1 - 
W%' 2 \\ny and, similarly, \\Wp - Wp\\ hlM = \\Vvf 1 - wf 2 \\ hlx >. 

Proof. Recall that WW? 1 — Wf a ||n is given by (|4.3p . in case of || • ||n,i further 
assuming f,g > 0, see Remark 14.11 Since Wf 1 — W 2 2 is constant on each 
set Cij := <f^ 1 (Ai) n f 2 1 {Bj), we may as in the proof of Lemma [5.51 average 
/ and g in (|4,3p over each such set, so it suffices to consider / and g that 
are constant on each set Cij. Consequently, if W\ = on Ai x A k and 
W 2 = Vji on Bj x Bi, then 



|| -WP || n , M = max 
(/y)>(flw) 



^ KCij)v(Cki)(uik-Vji)fijg k i , (6.4) 
i,j,k,l 

taking the maximum over all arrays (fij) and (gn) of numbers in [0, 1] for 
|| • ||n,i and in [—1, 1] for || • 1 1 n,2 - This depends on the coupling only through 
the numbers fi(Cij), and the result follows. 

For the L 1 norm we have immediately, with the same notation, 

Wf 1 - W?\\x llt = J2 Kdj)KCki)\u ik - v jt \, 

i,j,k,l 

and the result follows. □ 

Lemma 6.4. Let f2i and £l 2 be probability spaces and W±,W{ G 

and W 2 G L 1 ^!)- Then Sd(W u W 2 ) < 5 n (W{,W 2 ) + \\Wi - W[\\ u and, 

similarly, S 1 (W 1 ,W 2 ) < 5i{W[,W 2 ) + ||Wi - W[\\i. 

Proof. Let ((pi,(p 2 ) be a coupling of Oi and fl 2 . Then, using Lemma 15.51 

Sn(W l7 W 2 ) < WW? 1 - Wp\\a < \\(W{)^ - Wp\\ n + \\W^ - (W{)^\\ a 

= || - Wp\\ n + II (Wi - W{)^\\u 

= \\(W[)^ - wp\\ D + II Wi - W[\\ u . 

The result for 5a follows by taking the infimum over all couplings. The proof 
for 5\ is the same. □ 

Lemma 6.5. Let, for i = 1, 2, 3, Qi be a probability space and Wi G L (fi?). 
Then 5 a (W 1 ,W 3 ) < 5 n (W 1 ,W 2 ) + 5 n {W 2 ,W 3 ) and, similarly, Si(Wi,W 3 ) < 
Si(Wi, W 2 ) + Si(W 2 , W3). Hence 5a and 5\ are (pseudo) metrics. 

Proof. Roughly speaking, given a coupling of £li and Vt 2 and another cou- 
pling of £l 2 and O3, we want to couple the couplings so that we can compare 
pull-backs of W\ and W3. This simple idea, unfortunately, leads to tech- 
nical difficulties in general, but it works easily if, for example, the spaces 
are finite. We use therefore an approximation argument with step functions 
which essentially reduces to the finite case. 

Thus, suppose first that W\, W 2 , W3 are step functions with corresponding 
partitions Q\ = Ui=i^«> Q 2 = U/=i-^j; ^3 = Uk=i and assume for 
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simplicity that ni(Ai), /x 2 (Bj) and ^(Cfc) are non-zero for all i,j,k. (fi£ 
denotes the measure on O^.) 

We consider 5n; the proof for <5i is the same. Let e > 0. By the definition 
of Sn and the comments just made (see (]6.3fl ) . there exist measures // on 
fii x S7 2 and //' on £l 2 x O3, with marginals m on f^, such that 

Wl 1 - WpHn^ < S n (W u W 2 ) + s, (6.5) 
\\W^ - Wp\\a,^ < 5 n (W 2 , W 3 ) + e. (6.6) 

(We abuse notation a little by letting tti denote the projection onto 0,£ from 
any product space.) 

Define a measure \x on Q\ x f2 2 x ^3 by, for E C fii x f2 2 x f2 3 , 



_ //(Aj x Bj)fi"(Bj x C fc ) ^ix/i 2 x M3_Cg_n (Aj x Bj x C fc )) 



We have ^{A,) = £\ //(^ x By), // 2 (Bj) = X Bj) = £ fe /A^j x 

Cfc), and // 3 (C&) = ^2jfJ-"(Bj x Cfc). It follows that the three mappings 
7T£ : (Qi x S7 2 x ^3)^) - >• (fifj/^f) are measure-preserving since, for example, 
ifFCOj, then vrf^F) = F x 2 x 3 and 

^\F)) = ^(FxQ 2 x n 3 ) 

_ x - fj/jAj x Bj)n"(Bj x C fc ) Mi x /i 2 x /i 3 ((Fn Aj) x Bj x C fc ) 

' //l(^)M2(^>3(Cifc) 

//(^ x Bj]fjf{Bj x C fc ) /xi(FnAj) 



E 



;; , /'2(/' ; i Ml(^) 

= Y,AAi x = E Mi^n^) = m(F). 

i,j ^ ^ % > i 

In particular, is a probability measure. 

The projections ix\ 2 : x Q 2 x (73 — > $7i x Q 2 and 7123 : fii x Q, 2 x $^3 — > 
Q, 2 x $73 map \jl to measures pt' on Sli x fi 2 and ju" on f2 2 x fi 3 . We have, for 
any i and j, 

Jl'(Ai x Bj) = ^ 2 \Ai x Bj)) = /i(Ai x Bj x Q 3 ) 

//(A x Bj)fi"(Bj x Cfc) /xi x fi 2 x /x 3 (Aj x Bj x Cfc) 



E 



M-Bj) fii(Ai)i_i 2 (Bj)ij, 3 (Ck 
fj,'{A x Bj)n"(Bj x Cfc) 



2^ — M VH x #i)- 



Hence, by Lemma 16.31 

ll^r - WHn^xf^ = ||WF - W^lb^xn^M'- (6.7) 
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Further, since -k\ 2 ■ (fii x Q 2 x ^3,^) — >• (fiixJ^jA*') is measure-preserving, 
Lemma 15.51 implies that (recall our generic use of ire) 

\\WP - wpWn^xn.xn^ = WW? - W^W^xW ( 6 - 8 ) 
Combining (|6.5p . (|6.7p and (|6.8p . we find 

WW? - WpWn.p < 5n(Wi,W 2 ) + e. (6.9) 

Similarly, 

\\W? - WZ a \\n,p < 5 n (W 2 ,W 3 ) + e. (6.10) 

We have reached our goal of finding suitable couplings on the same space, 
viz. (Oi x O2 x ^3,/i), and we can now use the triangle inequality for || • [|n 
and deduce 

SniWu W 3 ) < \\W? - Wp\\n tlt < \\W^ - W^\\ u ,„ + \\W^ - wp\\ n ^ 
< 8 n (W l7 W 2 ) + 5 D {W 2 ,W 3 ) + 2e. 

Since e > is arbitrary, this implies the desired inequality 5n(W\,W 3 ) < 
finiWi, W 2 ) + Su(W 2 , W 3 ) in the case of step functions. 

In general, we approximate first each Wg by a step function Wl such that 
\\W£ — W^u&i < £ - (We may assume, as we did above, that all sets in 
the partition have positive measures by removing any null sets in them, 
redefining W[ on a null set.) The result for step functions together with 
several applications of Lemma 16.41 yield 

Sn(Wi, W 3 ) < 6 n (W{, W* 3 ) + 2e< 6 n (W{, W' 2 ) + 5 n (W^, + 2e 

< 6 a (Wx, W 2 ) + 5 a (W 2 , W 3 ) + 6e. 

The result <fa(Wi, W 2 ) < $ n (Wi, W 2 ) + 5 D (W 2 , W 3 ) follows. □ 

Corollary 6.6. Let, for i = 1,2,3, fli be a probability space and Wt E 
L l (tij). J/<fa(Wi,W 2 ) = 0, then J n (Wi,W 3 ) = S a (W 2 ,W 3 ). (The same 
result holds for 5\ .) □ 

Consider the class W* := Un^(^) °f an graphons (on any probability 
space). We define a relation = on this class (or on the even larger class 
U n L 1 (^ 2 ))by 

W X ^W 2 if 5 n (W 1 ,W 2 ) = 0. (6.11) 
Corollary 16.61 shows that this is an equivalence relation, and that 6\j is a 
true metric on the quotient space W := W*/ =. We say that two graphons 
W U W 2 are equiv alent if W\ = W 2 , i.e., if 5 U {W U W 2 ) = 0. (We will see 
in Theorem 18.101 below that S±(Wi,W 2 ) = defines the same equivalence 
relation.) 



It is a central fact in the graph limit theory 14] that this quotient space 
W := W*/ = is homeomorphic to (and thus can be identified with) the set 
of graph limits; moreover, the metric space (W,<5n) is compact. (See also 



241] .) The compactness is closely related to Szemeredi's regularity lemma, 



see 49]. 
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We will always regard W as a compact metric space equipped with the 
metric 5\j, except a few times when we explicitly use Si instead. Note that 5\ 
is a larger metric and thus gives a stronger topology. In particular, (W, S±) 
is not compact. 

Example 6.7. If W : O 2 — > [0, 1] is any graphon (or kernel) on a probability 
space 0, and (p : O' — > O is a measure-preserving map, then, as remarked 
above, W is equivalent to its pull-back . 

Example 6.8. Let G be a graph with vertex set V = {1, ...,n}, and 
consider the graphons Wq and W G defined in Example 12. 71 Let (p : (0, 1] — > 
V be the map x i— )■ |~nx] . Then is measure-preserving and (|2.2p defines 
as the pull-back (W^)^. Hence W% = W<L 



We can now prove, following 141 ]. that the definition (|5.6p agrees with 
our definition (|6.ip of the cut metric for [0,1], and more generally for any 
atomless Borel spaces. We include several related versions; note that (i) is 



6TTD and (v) is (JM 



Theorem 6.9. Let W\ and W 2 be two kernels defined on probability spaces 
(QijUx) and (^2,^2), respectively. Then the following are the same, and 
thus all define fc(Wi,W2)- 

(i) For any Oi and 0,2, 

inf \\W^-Wp\\ D , n ,», 

where the infimum is over all couplings (pairs of measure-preserving 
maps) tp\ : (0,fj,) — > (^1,^1) and y?2 : (^,A t ) ~~ ^ (^2,^2)- 

(ii) For any Oi and 0,2, 

inf \\W^- Wp\\n,nixsh,» 

where iTi : 0± x 2 — > &i is the projection and the infimum is over 
all measures fj, on 0\ x 0,2 having marginals fi\ and \i2- 

(iii) For any 0\ and O2, for 5u,2, 



inf sup 

M ll/l|oc,||s||oo<l 



(Wi(a?i,j/i) - W 2 (x 2 ,y2)) 

(^ixn 2 ) 2 



• f(xi, x 2 )g(yi,y2) dfi{xi,x 2 ) dn{y 1 ,y 2 ) 



taking the infimum over all measures \i on 0\ x O2 having marginals 
[i\ and fj,2i for 5\j,i we further restrict to f,g > 0. 
(iv) Provided Oi and O2 are atomless Borel spaces, 

inf II Wi -W?\h, 

<p 

where the infimum is over all measure-preserving <p : 0\ — > 2 - 
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(v) Provided and fl 2 are atomless Borel spaces, 



inf 

f 



Wf\\n, 



where the infimum is over all measure-preserving bijections <p : $7i — >• 
^2, z.e. ; over all rearrangements ofW 2 defined on Q.\. 
(vi) Provided fii = J2 2 = [0, 1], 

inf || W x -Wi || □, 

where the infimum is over all interval permutations a : [0, 1] —> [0, 1] , 
defined by permutations a of {1, ... ,n} with n arbitrary. 



Proof. 
argument. 



(ii) We have shown this in (16.3P and the accompanying 



ni 



Directly from the definition (|4.3p (using Remark 14.11 for 
writing x = (xi,x%) and y = (2/1,2/2)- (The expression in (hi) is just 
writing the definition explicitly in this case.) 

For (iv) and |(v)[ we first note that by Theorem IA.7| tt\ and ^2 ar e iso- 



morphic to [0, 1] (equipped with Lebesgue measure), i.e., there are measure- 
preserving bijections ipj : [0,1] — > flj. It is evident that we may use these 
maps to transfer the problem to the pull-backs Wf 1 and W 2 2 on [0, 1]. In 
other words, we may in (iv) and (v) assume that fii = £l 2 = [0, 1]. 

In this case, denote the quantities in (i) - (vi) by c j^j, . . . , ^T^jy We have 

< < j(j v ) < ( f(v)1 — ' l(vi) ' smce we take infima over smaller and smaller 



m 

sets of maps. Further, we have shown that 
the proof, it thus suffices to show that ^ 



iat fe]=^I)]='Jhi)] 

rap to 



To complete 



Let £ > and let Zjjv denote the interval \{i — T)/N,i/N], for 1 < i < N. 
The set of step functions W : [0, l] 2 — > R that correspond to partitions of 
[0, 1] (or rather (0, 1], but the difference does not matter here) into m equally 
long intervals Ii m , . . . , I mm for m = 1, 2, . . . , is a dense subset of ^ 1 ([0, l] 2 ). 
Hence we may choose m > and two such step functions W[ and W 2 so 
that ||Wj — Wj ||i < £, i = 1, 2. (We may first obtain such W[ with different 
mi and 1712, but we may then replace both by m := m\m 2 .) By Lemma 16. 
and its proof, which applies to all the versions <|^yj, . . . , ^( v j) , we have 

5*(W 1 ,W 2 )-2e<5*{W[,W 2 \) < 6*(W u W 2 )+2e I 



-12) 



for every * = (i) 



vi 



Choose a probability measure \i on Qi x ^2 = [0, l] 2 such that HW^ 1 — 
W^lln < W 2 ) + e. We may evaluate this cut norm by (|6.4p 

(replacing W{ by W() and as asserted in Lemma EU the cut norm depends 
only on the numbers yu(Cjj), where now Cjj := 7r^ (Ij m ) n w 2 (Ij m ) = 
Iim x Ijmi so we ma Y assume that the coupling measure /1 on [0, l] 2 on each 
square Cij equals a constant factor Xij times the Lebesgue measure. (Hence, 
/J>{Cij) = Xij/m 2 .) We adjust these factors so that every /i(Cjj) is rational; 
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we may do this so that the marginals still are correct, i.e., for every i and j, 

J2»(cu)=j2»( c n) = i- ( 6 - 13 ) 



by an arbitrary small amount, 
W[,W') + e. 



The adjustment will change cut norm in ([6 
so we can do this and still have ||W : [ 7ri — W^lln < 

We now have fJ-(Cij) = aij/N for some integers N and aij, 1 < i,j < m 
Let b := N/m. By (|6.13p . for every i and j, 



x N 
> aij = — 
t-r 1 m 



(6.14) 



6i 

k=b(i-l)+l 



I, 



UN 



Hence, b is an integer, and thus every interval Ij TO is a union |J 
of b intervals IkN of length 1/N. By (|6.14j) . we may construct a permutation 
a of {1, ... , iV} such that cr maps exactly ctjj of the indices k £ [b(i — 1) + 1, bi] 
into [6(j-l) + l,6j], for all Hence, A(/ !m no : ~ 1 (/j m )) = a^/iV = //(CV,). 
Thus, Lemma 16.31 applies to the couplings (tti,tt 2 ) and (i, a) (defined on 
[0, 1]); hence, 



^pv[,wj)<\\w{ 



\W[^ - || D < ^njfWf, W' 2 ) + e. 



Finally, (pU2j) yields <|(^|(Wi, W-Q < (f^PFi, W 2 ) + he, and the result 
follows since e is arbitrary. □ 

Remark 6.10. On spaces with atoms, the quantities 



m 



(iv) 



and fc^l defined 



(iv) and (v) are in general different from do, see Example 15.61 (In this 



case, they are larger than do, see Remark l6.2[ ) Furthermore, for two general 
probability spaces £li and £l 2 , it is possible that there are no measure- 
preserving maps 0,i —> Q 2 a t an ) i n which case the definitions in (iv) and (v) 
are not appropriate; and even if we may interpret ^ 



iv 



default value 1 



(for graphons; for general kernels we would have to use oo), in such cases 
TJ^TJ] is not even symmetric in general. (For an example, modify Example 15. 6 



by replacing W 2 by a pull-back defined on [0,1]; then there are measure- 



Q± but not conversely. We have 



1. 



(iv) 



{W 2 ,W X 



preserving maps £l 2 
2e<d^fW 1 ,W 2 )-- 

Remark 6.11. In (iv) , it suffices that f^i and £l 2 are Borel spaces such that 
f2i is atomless. To see this, replace W 2 by its pull-back W 2 defined on the 
atomless Borel space £l 2 := Vt 2 x [0, 1], where it is the projection onto £l 2 . 



Remark 6.12. (iv) and (v) hold also for atomless Lebesgue spaces, since 
then, for £ = 1,2, 0,£ = (D,£, Tt, m) is the completion of some Borel space 
0° = (Qg, , Hi), and we may replace We by a kernel Wf on with 
Wi = W® a.e., cf. Remark 12.51 note that every measure-preserving map 
ip : — > Q 2 also is measure-preserving Q± — > Q 2 . 
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Remark 6.13. An obvious analogue of Theorem 16.91 holds for S±. (In (hi _ 
the integral is Jr UlX Q 2 ^ \Wi(xi, yx) - W 2 (x 2 , 2/2) | d//(xi, x 2 ) d/x(yi, 2/2), and 
there are no / and g.) 



Remark 6.14. In probabilistic notation, see Remark l6.lt (iii) can be written 
as 

inf sup ^({W 1 {X[,Xl)-W 2 {X' 2 ,X'i))f{X' 1 ,X' 2 )g{XlX'i) 

where the infimum is taken over all couplings (X[,X 2 ) of two random vari- 
ables X\ and X 2 such that Xg_ is f^-valued and has distribution fii, and 
[X'{,X 2 ) is an independent copy of (X[,X 2 ). 

Corollary 6.15. Let Vt be an atomless Borel spaces, e.g. [0, 1], and let W be 

a graphon on O. Then the equivalence class of all graphons on £7 equivalent 
to W equals the closure of the orbit ofW under measure-preserving bisections 
( or maps); i.e., 



{W G W(fi) : W' = W} = {W 1 ^ : ip G S mp } = {Wf : cp G S mph }, 

where 5 mp is the set of all measure-preserving if : O — > O, and S mp b is the 
subset of all measure-preserving bijections. The closure may here be taken 
either for the cut norm or for the L 1 norm. 



Proof. For the closure in cut norm, this follows from Theorem RTI Hv) and 



By Remark 16.131 the same holds for the closure in L 1 norm and the 
equivalence class {W G W(O) : 5i(W, W) = 0}. However, by Theorem lHTOl 
below, 5\ and Sq define the same equivalence classes. □ 

Remark 16.101 shows that the (equivalent) definitions in Theorem 16. SnT 



(iii) are the only ones useful for general probability spaces. Another ad- 
vantage of them is that, as shown by Bollobas and Riordan [13], the infima 
are attained, at least for Borel spaces. (This is not true in general for the 
(iv) - (vi)[ not even in the special case when the infimum is 0, see 



versions m 



Example 18.11 below. 

Theorem 6.16. Let W\ and W 2 be two kernels defined on Borel proba- 
bility spaces (Qi,fj,i) and (Q 2 ,li 2 ), respectively. Then the infima in Theo- 



rem \6.^\) - (iii) are attained. In other words, there exists a probability mea- 
sure li on Vt\ x O2 with marginals li\ and li 2 , and thus a corresponding 
coupling (tti,tt 2 ), such that W 2 ) = WW^ 1 - W 2 2 \\n,n 1 xn 2 ^- 

Proof. By Theorem IA.4I and Remark IA.5|. every Borel measurable space is 
either countable or isomorphic to the Cantor cube C := {0, 1} 00 . Hence, we 
may without loss of generality assume that each of the two spaces £lg (where, 
as in the rest of the proof, I = 1,2) is either a finite set, the countable set 
{0} U {1/n : n G N} or C, equipped with some probability measure (j,g. Note 
that in every case Vtg is a compact metric space. 

For l\,l 2 G {1,2}, Let A^l^ x f^ 2 ) be the set of all step functions on 
fifj x £li 2 corresponding to partitions &,£ m = (Jj Ai m where every part Ai m 
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is clopen (closed and open) in ^£ m , m = 1,2. (We extend here the definition 
of step functions on £1 x Q to products of two different spaces in the natural 
way.) For the spaces we consider, A(fle 1 X fi^J is dense in L 1 (Q,g 1 x Q,£ 2 ,fi) 
for any probability measure [i on the product. (This is the reason why we 
replaced [0, 1] by the totally disconnected space C. It is possible to use [0, 1] 
instead, with minor modifications, see [121].) 



Denote the integral in Theorem 16.9,11m) by $(Wi,W 2 , f,g, /J,). By Theo- 



rem [621 there exist probability measures v n on £L\ x Q, 2 such that 

sup \®{WuW 2 J,g,Vn)\<8n{W l ,W 2 ) + l/n. (6.15) 

(For 6a,i, we tacitly assume that f,g> 0.) 

Since Q± and Q 2 are compact metric spaces, fii x £l 2 is too. Hence, the 
set of probability measures on fix x £l 2 is compact and metrizable (see 0]), 
so there exists a subsequence of (u n ) that converges (in the usual weak 
topology) to some probability measure v on fii x VL 2 . We consider in the 
sequel this subsequence only. 

Let e > 0. By the remarks above, we may find W[ G A(tif) with \\Wi — 
Wi\\ L i(o t xSl t ) < £ ) and hence, assuming ||/||oo, Halloo < 1, 

\HWi,W 2 , f,g, u)\ < MW[, W 2 , f, g,is)\ + || W x - W[\\ u + \\W 2 - W^\\ n 
<mW[,W 2 -J,g,v)\+2e (6-16) 
and similarly, for every n and every /, g with ||/||oo, Halloo < 1, 

\*(W{,WZ,f,g,v n )\ < \®{W 1 ,W 2 J,g,u n )\+2e. (6.17) 

Since W[ and W 2 are step functions, they are bounded, so there exists some 
M with ||W^||oo < M. For any / and g with ||/||oo, ||<?||oo < 1, we may 
similarly find /' and g' in A(Q\ x Q 2 ), with ||/||oo> IMloo < 1 5 such that 
11/ - /'lU 1 ^)' W 9 ~ 9'WlHv) ^ £ / M - lt follows that 

MW{,W^f,g,iy)\ < MW[,W^f',g',u)\+Ae. (6.18) 

Since W[, W 2 , f, g' all are step functions in the sets A, the integral 
$>(W{, W 2 , f',g' , fJ.) can be written as a linear combination of integrals 

/ dfi(xi,x 2 )dfi(yi,y 2 ) = fi(Ai x Bj)fi(A k x J3 m ), 

J A{XBjXA k X B m 

where further the sets A{, Bj,Ak, B m are clopen. Hence each term, and thus 
Q(W{,W 2 , f , g' , fi), is a continuous functional of [i; consequently, 

*(Wi,W£, f, g', v n ) -> $(Wf, /', g', u). (6.19) 

By ([HTTjl and (I6T5D . 

I^W,^,/',^,^)] <<fc,(Wi,W a ) + l/n + 2e, (6.20) 

and thus (|6.19p yields 

|<K^W' 2 \/ , ,<7>)| <6 a (WuW 2 ) + 2e (6.21) 
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and, by (IBTTBl) and (IfTTKD . 

\<f>(Wi,W 2 J,g,v)\ < MW[,W^f',g',u)\+6e<5 D (W 1 ,W 2 ) + 8e. (6.22) 
Since e is arbitrary, we thus obtain |$(Wi, W 2 , f, g, z^)| < <fa(Wi, W2)) and 
- W 2 || n)I/ = sup |*(Wi,W 2 ,/,p,i/)|<(y n (Wi,W 2 ), (6.23) 

||/[]oo,||ff||oo<l 



which shows that equality is attained in Theorem I6.£|(ii) - (iii) by v, and in 



Theorem I6.£|(i)] by the coupling (7Ti,7T2) defined on x Sl%,v). □ 

The assumption that the spaces are Borel (or Lebesgue, see Remark 16. 12 1) 
really is essential here, even when the infimum <5n(VFi,W2) = 0; in Exam- 
ple [8T3] we will see an example of two equivalent kernels such that none of 
the infima in Theorem 16.91 is attained. 



7. Representation on [0, 1] 

As said in the introduction, many papers consider only kernels or graphons 
on [0, 1] = ([0, 1], A). This is justified by the fact that every kernel [graphon] 



is equivalent to such a kernel [graphon]. (See 38] for a generalization. 



Theorem 7.1. Every kernel [graphon] on a probability space (f2, F, fj>) is 
equivalent to a kernel [graphon] on ([0,1], A). 

Corollary 7.2. The quotient space yy ■= \J n W(n)/ =, which as said 
above can be identified with the space of graph limits, can as well be defined 
W:= W([0,1])/ ^. 

Before proving Theorem 17.11 we prove a partial result. 

Lemma 7.3. Every kernel [graphon] on a probability space ($l,F,[i) is a 
pull-back of a kernel [graphon] on some Borel probability space. 

Proof. Let W : Q 2 — > [0, oo) be a kernel. Since W is measurable, each set 
E r := {(x,y) : W(x,y) < r}, where r£K, belongs to F x F, and it follows 
that there exists a countable subset A r Q F such that E r £ F{A r ) x F{A r ), 
where F{A r ) is the a- field generated by A r . Hence, if F$ is the u-field 
generated by the countable set A := UreQ>"^ r ' * nen Q F and W is 
Fq x J^-measurable. 

List the elements of A as {Ai,A2 3 ■ ■ ■ }■ (If A is finite, we for conve- 
nience repeat some element.) Let C := {0, 1}°° be the Cantor cube (see 
Remark [A. 51) and define a map cp : SI — > C := {0, 1}°° by cp(x) = (l{x G 
Ai})°2 =1 . Let v be the probability measure on C that makes ip : O — > C 
measure-preserving, see Remark 15.41 

The ex-field on O, generated by tp equals Fq, and thus the cr-field on S7 x Q 
generated by (p,p) ■ ^ 2 — > C 2 equals Fq x Fq. Since W is measurable for 
this cr-field, W equals V o (<^, p) = V v for some measurable V : C 2 — > [0, oo). 
Since W is symmetric, we may here replace V(x, y) by ^ (V(x, y) + V (y, x)) 
and thus assume that also V is symmetric. Hence, V is a kernel on C and 
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W = V v '. If W is a graphon, we may assume that V : 2 — > [0, 1], and thus 
V too is a graphon. This proves the result with the Borel probability space 
(C». □ 



Proof of Theorem 7.1 Let W be a kernel on some probability space Q. By 
Lemma 17.31 W = V for some kernel Vona Borel probability space (f2i, v). 
(With 0\ = C in the proof above.) If v is atomless, the result follows by 
Theorem I A. 71 In general, let 0,2 ■= Ox [0, 1], with product measure V2- The 
projection tt : 0,2 — > ^1 is measure-preserving, so V = V2 := V w . Moreover, 
($12,^2) is an atomless Borel probability space, so by Theorem IA.7I there 
exists a measure-preserving bijection ip : [0, 1] —¥ 0,2- Hence U := V 2 ^ is a 
kernel [graphon] on [0, 1] and U = V 2 = V = W. □ 

Remark 7.4. If \i is atomless, we may by Lemma IA.1I find an increasing 
family of sets B r C Q, r G [0, 1], such that [i(B r ) = r. In the construction in 
the proof of Lemma 17.31 we may add each B r with rational r to the family 
A. Then the measure v on C is atomless, because if x were an atom, then 
E := ip~ 1 {x} would be a subset of O with fJ,(E) > such that for each 
rational r, either E C _B r or E n i? r = 0, but this leads to a contradiction 
as in the proof of Lemma IA.31 Consequently, we then can use Theorem IA.7I 
directly to find a measure-preserving bijection -0 : [0, 1] —> C, and a kernel 
U := on [0, 1] such that W = V v = U^~ loip . Consequently, every kernel 
on an atomless probability space (0,/x) is a pull-back of a kernel on [0,1], 
which combines and improves Lemma 17.31 and Theorem 17.11 in this case. 
(Conversely, by Lemma IA.31 no kernel on a space (fi, /j,) with atoms is a 
pull-back of a kernel on [0, 1].) 

8. Equivalence 

We have seen that if W\ and W2 are two kernels on some probability 
spaces 0\ and O2, and W\ = W2 (or just W\ = W% a.e.) for some measure- 
preserving (p : Qi — )■ Q%, then Wi = W2. The converse does not hold, as 



shown by the following standard examples [131 ] . 

Example 8.1. Let <p : [0,1] — >• [0,1] be given by <p(x) = 2x mod 1. 
Take W\(x,y) = xy, and W2 '■= Wf . Then W\ and W2 are graphons 
on [0,1], and Sq(Wi,W2) = 0. However, there is no measure-preserving 
ip : [0,1] — > [0,1] such that W\ = W2 a.e., and as a consequence, the in- 



fima in Theorem I6.£|(iv) - (vi) are not attained. (See Lemma 14.71 ) In fact, 
if such a ij) existed, then W± = (Wf)^ = Wf°^ a.e., which implies (e.g. by 
considering the marginal J^W(x,y)dy = x/2) that if(ip(x)) = x a.e., and 
thus ip(x) G {x/2, x/2 + 1/2} a.e. However, if E := ^ l ([0, 1/2]), it follows 
that for any a and b with < a < 6 < 1, ED [a,b] = V'~ 1 ([a/2, b/2)), so 
X(E n [a, 6])/(6 — a) = 1/2. In particular, for every x G (0,1), the density 
lim £ ^o A(£' n (x — e, x + e))/2e = 1/2. On the other hand, by the Lebesgue 
density theorem, this density is 1 for a.e. x G E and for a.e. x £ E, a 
contradiction. 
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Example 8.2. More generally, let (p : [0, 1] — > [0, 1] be given by (p n ( x ) = nx 
mod 1, and define W n := W^™ with the same W as in Example 18.11 If 
W n = Wm a.e., then mi/i(x) = nx mod 1 a.e. Let E := ip~ 1 (0, 1/m). Then, 
for < a < b < 1/m, t/ ,_1 ([o, b]) = [J™=q E n ([rna/n,mb/n] + j/n) (a.e.) 
and thus, if ip is measure-preserving, 

> / , 1 /r sr~^ , ( ^ \ma + 7 m& + j~ 

b-a = \ty-\[a,b}))=J2*[En [—-L,—-^L 

3=0 V 

Divide by b — a, take a = x — e and b = x + e, and let e — )• 0. The Lebesgue 
differentiation theorem implies that for a.e. x E (0, 1/m), 

n z — ' In J 

i=o 

Since the sum is an integer for each x, this implies that n is a multiple of 
m. Conversely, if n = mi for an integer t, then ip n = ip m o c^, and thus 
= {W^Y 1 = Wm - Consequently, all W n are equivalent (being pull- 
backs of W\), and there exists a measure-preserving tp : [0, 1] — > [0, 1] such 
that W n = Wm a.e. if and only if n is a multiple of m. In particular, W% 
and W3 are equivalent, but neither of them is a pull-back of the other. 

However, equivalence is characterized by sequences of pull-backs. We 
begin with a simple result. 

Theorem 8.3. Let W' and W" be kernels defined of probability spaces Q,' 
and tt". Then W' =i W" , i.e. 5 n (W', W") = 0, if and only if there exists a 
finite sequence of kernels W, t defined on probability spaces Qi, i = 0, . . . , n, 
with Wo = W' and W n = W", such that for each i > 1, either Wi-\ = Wf 1 

a.e. for some measure-preserving (pi : Slj-i — > fti, or W, = Wf_^\ o,.e. for 
some measure-preserving ipi : Vti — > Oj_i. 

Proof. Suppose that W' = W". We show that we can construct such a se- 
quence with n = 4. We thus take Wo := W' and W4 := W" . By Lemma 173] 
we can find W\ and W3 on Borel probability spaces Q± and ^3 such that 
Wo = Wf 1 and W4 = W^ 4 for some measure- preserving ipi and ^4. Then 
W\ =i W = Wi =i W 3 , so 6 a (Wi,W 3 ) = 0. By Theorem there exists 
a probability measure /J on fli x such that WW™ 1 — W£ 3 \\n,n 1 xn s ,n = ®> 
where 7Tj is the projection onto fij. Thus, by Lemma 14. 7\ W™ 1 = W^ s 
a.e. Hence, we can take Q2 : = (^1 x ^3>m)j ^2 := 7r 2 5 f3 '■= ^3 an d 

w 2 ■■= wf 2 = wp. 

The converse is obvious by Example 16.71 and Corollary 16.61 □ 

Example 18.21 shows that we cannot in general do with a single pull-back 
in Theorem 18.31 However, we can always do with a chain of length 2 in 
Theorem 18.31 In fact, Borgs, Chayes and Lovasz [Lj] proved the following, 
more precise and much more difficult, result. (We will not use this theorem 
later; the simpler Theorem 18.31 is sufficient for our applications.) 
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Theorem 8.4. Let W\ and W% be kernels defined of probability spaces Q,\ 
and f^2- Then W\ = Wi, i.e. <fo(Wi, W2) = 0, if and only if there exists 
a kernel W on some probability space 0, and measure-preserving maps <p>j : 
Uj -> O such that Wj = a.e., j = 1,2. 

We can always take Cl to be a Borel space. If Qi and Q 2 are atomless, 
we may take Q = [0, 1]. 

Proof. It suffices to prove the theorem for graphons W\ and W% the general 
case follows easily by considering the transformations Wx/(1 + W\) and 
W 2 /{l + W 2 ). 

We give a proof in Section [9j (Except for the final statement, which is 
shown below.) See also Borgs, Chayes and Lovasz [13f] for the long and 
technical original proof. In their formulation, the space O is constructed 
as a Lebesgue space, and the maps (pj are only assumed to be measurable 
from the completions (£lj,J r j,fij) of Qj to f2. However, this is easily seen 
to be equivalent: If 0, is such a Lebesgue space, then f2 = J 7 , //) is the 
completion of some Borel space Orj = (^-^ThA 4 )- We may replace W by 
an a.e. equal kernel that is J-q x J^-measurable, i.e., a kernel on the Borel 
space f^o- Further, since every Borel measurable space is isomorphic to a 
Borel subset of [0,1], see Theorem IA.41 the map cpj : J7j — >■ f?o which is 
J-j-measurable, is a.e. equal to an J^-measurable map cp'j. Replacing Q by 
S7o and (fj by (p'j, we obtain the result as stated above, with f2 Borel. 

For the final statement, suppose that f^i and ^2 are atomless, and let W, 
(fj and O be as in the first part of the theorem, with £1 Borel. Suppose that 
O has atoms, i.e., points a £ 0, with /x{a} > 0. Replace each such point 
a by a set I a which is a copy of the interval [0, //{a}] (with Borel a- field 
and Lebesgue measure), and let be the resulting Borel probability space. 
There is an obvious map 7r : O' — > f2, mapping each I a to a and being the 
identity elsewhere, and we let W := W w . For each atom a, and j = 1,2, 
let A a j := tp. 1 (a) C Qj. Then A aj is an atomless measurable space, and 
by Lemma IA.2I (and scaling) , there is a measure-preserving map A a j y I a . 
Combining these maps and the original (fj, we find a measure-preserving 
map (p'j : Q.j — > Q.' such that ipj = tt o ip 1 ^ and thus Wj = W tfi = {W'Y^ a.e. 
Finally, is an atomless Borel probability space, and may thus be replaced 
by [0, 1] by Theorem \AJ\ □ 

Remark 8.5. With a Lebesgue space f2, it is both natural and necessary to 
consider maps Qj —> Q that are measurable with respect to the completion 
of flj, as done in [3]. For example, if f2i = ^2 = [0, 1] with the Borel cr-field 
and W\(x,y) = W2(x,y) = xy, we can take £1 = [0,1] and cpj = l, but if 
we equip O with the Lebesgue cr-field, then (pj is not measurable Clj — > 
(and cannot be modified on a null set to become measurable). This is just a 
trivial technicality that is no real problem, and as seen in the proof above, 
it can be avoided by using Borel spaces. 



21 



SVANTE JANSON 



Theorem 18.41 says that a pair of equivalent graphons always are pull-backs 
of a single graphon. We may also try to go in the opposite direction and try 
to find a common pull-back of two equivalent graphons. As shown by Borgs, 
Chayes and Lovasz this is not always possible, see Example 18.131 below, 
but it is possible for graphons defined on Borel or Lebesgue spaces. We state 
this, in several versions, in the next theorem, together with conditions under 
which W\ is a pull-back or rearrangement of W% . (Recall that Example 18.21 
shows that this does not hold in general, not even for a nice Borel space like 
[0,1]-) 

If W is kernel defined on a probability space $7, we say following 13] that 
X\,X2 G Q are twins (for W) if W(x\,y) = W(x2,y) for a.e. y G We say 
that W is almost twinfree if there exists a null set ]Vc!l such that there 
are no twins xi,X2 G £l\N with x\ ^ X2- 

Various parts of the following theorem are given, at least for the standard 
case of graphons on S7i = O2 = [0, 1], in Diaconis and Janson [24| (as a conse- 
quence of Hoover's equivalence theorem for representations of exchangeable 
arrays [43|, Theorem 7.28]), Bollobas and Riordan [L^ . and Borgs, Chayes 



and Lovasz 13J]. A similar theorem in the related case of partial orders is 
given in 401 ] . 



Theorem 8.6. Let W\ and W2 be kernels defined on Borel probability spaces 
(f2i,/ii) and (^2,^2)- Then the following are equivalent. 

(i) W 1 * W 2 . 

(ii) There exist a coupling {pi,p 2 ), i.e., two measure preserving maps 
Pj : 0, — > j = 1,2, for some probability space J7 ; such that 

= Wp a.e., i.e., Wi(<pi(x), <pi(y)) = W 2 (<P2(x), ^2(2/)) a.e. 
(hi) There exist measure preserving maps pj : [0, 1] —> Qj, j = 1,2, such 
that Wf 1 = a.e., i.e., Wi(pi(a?),pi(i/)) = W*fa{?),<P*{v)) 

a.e. on [0, l] 2 . 

(iv) There exists a measure-preserving map ip : £li x [0, 1] — > such that 
W^ 1 = a.e., where tti : fii x [0, 1] — > Oi is the projection, i.e., 
Wi(x,y) = W 2 (ip(x,t 1 ),7p(y,t2)) for a.e. x,y G fii andt 1 ,t 2 G [0, 1]. 

(v) There exists a probability measure n on fix x Q2 with marginals (ii 
and H2 such that W^ 1 = Wp a.e. on (fli x Q.2) 2 , i-e., W\(xi,yi) = 
W 2 (x 2 ,y2) for n-a.e. (x 1 ,x 2 ), (2/1,2/2) G Oi X U 2 . 

If W2 is almost twinfree, then these are also equivalent to: 

(vi) There exists a measure preserving map <p : fii — >• such that W\ = 
W% a.e., i.e. W\{x,y) = W 2 (p(x), <p(y)) a.e. on fif. 

If both W\ and W 2 are almost twinfree, then these are also equivalent to: 

(vii) There exists a measure preserving map (p : $7i — > 0,2 such that (p 
is a bimeasurable bisection of Vt\ \ N\ onto VL2 \ N 2 for some null 
sets N\ C ill and N 2 C f^, and W\ = a.e., i.e. W\(x,y) = 
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W2(tp(x),tp(y)) a.e. on Of. If further (0,2,1x2) is atomless, for ex- 
ample if 0,2 = [0, 1], then we may take Ni = N2 = 0, so W\ is a 
rearrangement of W2 and vice versa. 



The same results hold if Oi and CI2 are Lebesgue spaces, provided in (iii) 



[0, 1] is equipped with the Lebesgue a -field, and in (iv) £l\ x [0, 1] has the 
completed a -field. 

Proof. We assume that £l\ and O2 are Borel spaces. The Lebesgue space 
case follows immediately from this case by replacing W\ and W2 by (a.e. 
equal) Borel kernels, see Remark 12.51 

We may also, when convenient, assume that W\ and W2 are graphons by 
using again the transformations Wi/(1 + W\) and W 2 /(l + W2 

First note that any of (iii) - (vii) is a special case of 
implies W\ = Wf 1 ^ Wp ^ W 2 ; thus any of [(iiJI^^Ti)] implies [(I)| We turn 
to the converses. 



and that (ii) 



(i) 



Assume W\ = W 2 , i.e., S n (W 1 ,W 2 ) = 0. First, by 
Theorem 16. 161 there exists a coupling (991,992) such that ((W^ 1 
<5b(Wi,W 2 ) = 0, and thus, by LemmaEZl Wf 1 = W? 2 
(ii)| holds. Moreover, by the same theorem and Theorem I6.£|(ii)[ we may 
take this coupling (991,992) as the projections (iri,ir 2 ) for a suitable measure 



Wp\\n = 
W£ 2 a.e. Consequently, 



IX on Q,\ x Q, 2 , which shows (v) 



(v) 



iii) Since (Q,\ x O2, fx) is a Borel probability space, Theorem IA. 91 
shows that there exists a measure-preserving map ip : [0, 1] — >• Oi x SI2, and 
then (tti o ip, tt2 o tp) is a coupling defined on £2 = [0, 1], which shows (iii) 

I (iii) follows also easily by Theorem IA.9I from the 
showed in [2_ 



(Alternatively, (i 
special case Oi = 



n 2 = [0, 1 



(iv) By Theorem IA.91 there exist measure preserving maps 7,- : 
[0,Tf^ fy, j = 1,2. Then W? 1 and W 2 72 are kernels on [0,1], and W^ 1 ^ 
W\ = W2 — W^ 2 ■ The equivalence |(i)| <s=> (iv) was shown (for graphons, 
which suffices as remarked above) in [24] in the special case Q\ = Q2 = [0, 1], 
based on [H, Theorem 7.28], and thus |(iv)| holds for W? 1 and W] 2 . In 
other words, there exists a measure preserving function h : [0, l] 2 ->• [0, 1] 
such that W^^x^y) = W^ 2 (h(x, z±), h(y, Z2)) for a.e. x,y,Zi,Z2 G [0,1]. 
By Lemma 18.91 below (applied to (f2i,//i) and 71), there exists a measure 
preserving map a : Q\ x [0, 1] — > [0, 1] such that 71(0(5, u)) = s a.e. Hence, 
for a.e. x, y G £2\ and Ui,u 2 , Zi,z 2 G [0, 1], 

W 1 (x,y) = Wi(7i o a(x,ni),7i o a(y,u 2 )) = W^ 1 (a(x , u±) , a(y , u 2 )) 

= Wi 2 (h(a(x, ui),zi),h(a(y, u 2 ),z 2 )) 

= W 2 (l2 /i(a(x,ui),zi),72 o h(a(y,u 2 ),z 2 )). 

Finally, let /3 = (Pi, fa) be a measure preserving map [0,1] 
d efine t/)(x ,t) := 72 o h(a(x,fa(t)),fa(t)). 



[0, l] 2 , and 



(iv) ==> (vi) Since, for a.e. x,y,ti,t 2 ,t±, 



W 2 (^(x,t x ),^(y,t 2 )) = W x (x,y) = W^x,^),^,^)) 
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and tp is measure preserving, it follows that for a.e. x, t\,t'i, tp(x,t\) and 
ijj(x,t'i) are twins for W 2 - If W 2 is almost twin-free, with exceptional null 
set N, then further tp(x,ti),tp(x,t'i) ^ N for a.e. x,ti,t[, since tp is measure 
preserving, and consequently tp(x,t\) = tp(x,t[) for a.e. x,t\,if^. It follows 
that we can choose a fixed t[ (almost every choice will do) such that ip(x,t) = 
ij}{x,if^) for a.e. x,t. Define <p(x) := tp(x,t' 1 ). Then tp(x,t) = (f(x) for a.e. 



x, t, which in particular implies that f is measure preserving, and (iv) yields 
Wi(x,y) = W 2 (jp (x),cp(y)) 



a.e. 



(vi) ==^ (vii) Let N' C Hi be a null set such that if x ^ N' , then 
Wi(x,y) = W 2 (<p(x),ip(y)) for a.e. y £ fii. If x,x' £ U\ \ N' and <p(x) = 
(p(x'), then x and x' are twins for W\. Consequently, if W\ is almost twinfree 
with exceptional null set N" , then ip is injective on fii \ N\ with N\ := 
N'uN" . Since £l±\Ni and Q2 are Borel spaces, Theorem I A. 61 shows that the 
injective map <p : Oi \ N\ — > CI? has measurable range and is a bimeasurable 
bijection ip : Six \ N\ — > Q2 \ A2 for some measurable set N2 C ^2- Since ip 
is measure preserving, ^2(^2) = 0. 

If 0,2 has no atoms, we may take an uncountable null set A 7 ^ C Sl 2 \ ^2- 
Let N[:= ip^ 1 {N! 2 ). Then N ± UN[ and N 2 UN^ are uncountable Borel spaces 
so they are isomorphic and there is a bimeasurable bijection 77 : N\ U A^{ — > 
N2 U N^. Redefine tp on N\ U A~{ so that ip = -q there; then ip becomes a 
bijection Vt\ — > 0,2- D 



Remark 8.7. A probabilistic reformulation of (ii) along the lines of Re- 
mark 16.11 is that there exists a coupling (X, Y) of random variables with 
the distributions /ii on Q\ and \i2 on Q2, such that if (X' , Y 1 ) is an indepen- 



dent copy of (X,Y), then W\(X, X') = W 2 (Y,Y') a.s. Similarly, [(v)] says 
that there exists a distribution (i.e., probability measure) /ionOi x Q2 with 
marginals /ji and /12 such that if (A, Y) and (A',y') are independent with 
the same distribution fj,, then Wi(X,X') = W 2 (Y,Y') a.s. US]. 



Remark 8.8. In (iv) , the seemingly superfluous variables t\ and t2 act as 



extra randomization; (iv) thus yields a kind of "randomized pull-back" using 
a "randomized measure-preserving map" tp, even when no suitable measure- 
preserving map as in (vi) exists. It is an instructive exercise to see how this 



works for Example 18.11 we leave this to the reader. 

The proof above uses the following consequence of the transfer theorem 
[H, Theorem 6.10]. 

Lemma 8.9. Suppose that (O, ix) is a Borel probability space and that 7 : 
[0, 1] — )■ O is a measure preserving function. Then there exists a measure 
preserving function a : $7 x [0, 1] — > [0, 1] such that 7(0(5, y)) = s for /x x A- 
a.e. (s, i/)e(]x [0, 1]. 

Proof. Let rj : [0, 1] — > [0, 1] and £ : $7 — > 17 be the identity maps rj(x) = x, 
£(s) = s, and let £ = 7 : [0,1] — > ft. Then (£, rj) is a pair of random 
variables, defined on the probability space ([0,1], A), with values in Q and 
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[0,1], respectively; further, £ is a random variable defined on (O, /x) with 

£ = £. By the transfer theorem Theorem 6.10], there exists a measurable 
function a : Q x [0, 1] — > [0, 1] such that if fj(s, y) := a(£(s), y) = a(s, y), then 

(£,??) is a pair of random variables defined on 17 x [0, 1] with (£,fj) = (£,??)• 
Since £ = 7(7?), this implies £ = 7(77) a.e., and thus s = £(s) = 7(0(5,2/)) 
a.e. □ 

There are several other, quite different, characterizations of equivalence. 
We give several important conditions from 13], 14] and (24I ] that use the 
homomorphism densities t(F, W) and the random graphs G(n, W) defined 
in Appendix ICl and Appendix [Dl 

Theorem 8.10. Let W and W be two graphons (possibly defined on dif- 
ferent probability spaces). Then the following are equivalent: 

(i) W ^ W. 

(ii) 5 n (W,W) = 0. 

(iii) 5i(W,W) = 0. 

(iv) t(F, W) = t(F, W') for every simple graph F. 

(v) t(F, W) = t(F, W') for every loopless multigraph F. 

(vi) The random graphs G(n, W) and G(n, W) have the same distribu- 
tion for every finite n. 

(vii) The infinite random graphs G(oo, W) and G(oo, W') have the same 
distribution. 



Proof. 



111 



(ii) This is just our definition of =. 

If fc(W, W) = 0, let Wo, • • • , W n be a chain of graphons as 
(or Theorem I8.4j) . We have 5\{Wi : Wf) = for any pull- 

i,Wi) = for every i > 1. Hence 



in Theorem 18? 

back of a graphon Wi, and thus 5\{W, t 



5i{W,W) 



111 



by the triangle inequality Lemma 16.51 
Trivial. 

This is immediate from (]C.ip for a pull-back, and the general 



case follows again by Theorem 



v) 



IV 



(iv) 






(vi) 



The distribution of G(n, W) is determined by the family 
{t(F, W) : \F\ < n} of homomorphism densities for all (simple) graphs F 
with \F\ < n, and conversely, cf. Remark ID. II 

(vi) <J=^- (vii) : The distribution of G(oo, W) is determined by the family 
of distributions of the restrictions G(oo, W)|[ n ] to the first n vertices, for 
n > 1, and conversely. However, G(oo,W)|r n i = G(n,W). See [24J] for 
details. 



iv 



(ii) See [141 ] or, for a different proof . 11311 . (This is highly non- 
trivial.) Alternatively, (vii) (ii) follows from [43l. Theorem 7.28], see (24I . 
Proof of Theorem 7.1]. □ 

Remark 8.11. One of the central results in [l4| is that, for graphons 
Wi, W 2 ,... and W, 5 n (W n , W) -)■ if and only if t(F, W n ) -»• t(F, W) for 
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every (simple) graph F. (Taking W n = Wo n for a sequence of graphs with 
\G n \ — > oo, this says in particular that G n — > W -4=>- t(F,G n ) — > t(F,W) 
for every graph F, see Appendix[Bj) As pointed out in [121 ] . this equivalence 



is equivalent to the corresponding equivalence (ii) <^=^ (iv) in Theorem 18. 101 
One way to see this is to define a new semimetric on the class W* of 
graphons by 

oo 

St(W, W) := 2~ n |t(i ? n, W) - t(F n , W% 

n=l 

where F\,F2, . . . is some (arbitrary but fixed) enumeration of all unlabelled 
(simple) graphs. By Theorem EEOJ 8 t (W, W) = W = W 

5a(W, W) = 0, so <5 t is, just as 8a, a metric on the quotient space W. More- 
over, the easy result Lemma IC.2I that each W h-> t(F n , W) is continuous for 
8u implies that 5 t is continuous on (W,5n), so the topology on W defined 
by St is weaker than the topology defined by 8\j. (Equivalently, the identity 
map (yV,8a) — > (W,8 t ) is continuous.) However, since (W,(fcj) is compact, 
this implies that topologies are the same, i.e., that the metrics 8a and <5 t are 
equivalent on W, which is the result we want. (This argument in fl~2| is essen- 
tially the same, but stated somewhat differently. Another equivalent version 
is to consider the mapping W — > [0, 1]°° given by W H> (t(F n , W))^ =1 , see 



24| |; this map is continuous and, by Theorem 18.101 injective, so again by 
compactness it is a homeomorphism onto some subset.) Note the importance 
of the compactness of W in these arguments. 

The distribution of a kernel W defined on a probability space (fi, /i) is 
the distribution of W regarded as a random variable defined on Q 2 , i.e., the 
push- forward of /x 2 by W, or equivalently the probability measure on R that 
makes W : £l 2 — > R measure-preserving, see Remark 15.41 

Corollary 8.12. IfW\ andW2 are two equivalent graphons, defined on two 
probability spaces Sli and then W\ and W2 have the same distributions. 
In particular, J^ 2 W* = f n 2 for every k > 1 . 

Proof. The conclusion obviously holds if W\ is a.e. equal to a pull-back W^f of 
W2, or conversely. The general case follows by Theorem [83] (or Theorem 18. 4p 
and transitivity. Alternatively, we may use Theorem 18.101 and observe that 
/ = t(Mk, Wi) if Mfc is the multigraph consisting of k parallel edges, see 
Example [CTj □ 

Note that, for any k > 1, W — > Jq 2 W k is not continuous in the cut norm, 
see Example IC. 31 

Finally we give, as promised above, the counter-example by Borgs, Chayes 
and Lovasz |l3j, showing that the condition that the spaces are Borel (or 
Lebesgue) is needed in Theorem 18.61 

Example 8.13. Let A C [0, 1] be a non- measurable set such that the outer 
measure A* (A) = 1 and the inner measure A*(^4) = 0. (Equivalently, every 
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measurable set contained in A or in its complement has measure 0.) Let 
La '■= {B n A : B G £}, the trace of the Lebesgue cr-field on A. Then the 
outer Lebesgue measure A* is a probability measure on (A, La), and the 



injection t : A — )■ [0, 1] is measure-preserving. (See e.g. |19|, Exercises 1.5.8 
and 1.5.11].) 

Let W(x,y) := xy. W is a graphon on [0, 1], so its pull-back W\ := W L , 
which equals the restriction of W to Ay. A, is a graphon on Qi := (A, La, A*), 
and W\ = W. The complement A c := [0, 1] \ A satisfies the same condition 
as A, so we may also define O2 := (A c , La c , A*), and let W2 = W be the 
restriction of W to A c x A c . 

Then Wi = W = W 2 , so W\ = W2. However, suppose that (^1,^2) is a 
coupling of f^i and ^2, defined on some space f2, such that Wf 1 = W^ 2 a.e. 

Then the marginal Wf 1<y ^ equals the pull-back (Wi^) lfil of the marginal 
: -> [0,1], but the marginal of Wi is 

W^Or) = W {1 \x) := / W(x,y)dy = x/2; 

Jo 



hence W^ 1 (x) = ipi(x)/2 for all a; G O. Similarly, W^ 2 ix) = <f2(x)/2 
for all x G (7. Our assumption W^ 31 = W^ 2 a.e. implies that the marginals 
are equal a.e., and thus ip\(x)/2 = if2(x)/2 a.e.; consequently, <p\{x) = </?2 (%) 
for a.e. x G £7. This is a contradiction since for every x, ^i(x) G A while 
(p 2 (x) G ^ c . 

Consequently, for every coupling (^1,^2) °f ^1 an d ^2 we have Wf 1 7^ 
W2" 2 on a set of positive measure and thus || Wf 1 —W2" 2 ||n > by Lemma [4~Tl 



Hence, the infima in Theorem 16.^(1) (iii) are not attained, and none of 



Theorem I8.6|(ii) - (vii) holds, although (i) does. 



9. Pure graphons and a canonical version of a graphon 

We present here a way to select an essentially unique, canonical choice 
of graphon among all equivalent graphons corresponding to a graph limit; 
more precisely we construct a graphon that is determined uniquely u p t o 
a.e. rearrangements. This construction is based on Lovasz and Szegedy [511 ] . 
although formulated somewhat differently. This will also lead to a new proof 
of Theorem 18. 44 a proof which we find simpler than the original one. 

For convenience, we consider only graphons, although the construction 
extends to general kernels with very few modifications. 

Let W be a graphon on a probability space (r^J 7 , fi). For each x G fl, 
the section W x is defined by 

W x (y):=W(x,y), y G O. (9.1) 

Thus W x is a measurable function — > [0,1], and in particular W x G 

Let ip\y '■ ^ - > L l {Q,F , [i) be the map defined by ifiw(z) '■= W x . By a 
standard monotone class argument (using e.g. the version of the monotone 
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class theorem in [371 . Theorem A.l]), see also [25l . Lemma III. 11. 16], ipw '■ 
O — > L l (Q,, F, fj) is measurable. 

Let nw be the push-forward \x^ w of \i by ipw, i-e., the probability mea- 
sure on L (Q,F, n) that makes ipw '■ — > (L 1 ^, F, //), nw) measure- 
preserving, see Remark 15.41 explicitly, 

H W {A) = ^\A)), ACL\n,F,»). (9.2) 

Further, let Q,w be the support of i- e -> 

n w := {/ G L l (tt,F,n) : fi w (U) > for every neighbourhood U of /}. 

(9.3) 

Q.W is a subset of L 1 (f2, F, [a), and we equip it with the induced metric, 
given by the norm in L l (£l, F, fx), and the Borel cr-field generated by the 
metric topology. 

Theorem 9.1. (i) * s a complete separable metric space, is a prob- 

ability measure on Q\y and i/jw(x) G &-w for fi-a.e. x G We can thus 
regard as a mapping f2 — > (defined a.e.); then ipw '■ (^jA 4 ) ~~ > 
(Q\y,fi\v) is measure-preserving. 

(ii) /j,\y has full support on Qw, i- e -> if U C Q,yy is open and non-empty, 
then jjlw(U) > 0. 

(hi) The range of ipw is dense in Q,yy. More precisely, ipw(ty H £lw = 
{Wj; : x G 0} n f^vy is a dense subset of Qyy. 

(iv) %C{/eI 1 (fl ) 7,/i):0</<lfl.e.}. 

(v) There exists a graphon W on (Qw^fJ-w) such that the pull-back 
W<Pw = w a.e.; this graphon W is unique up to a.e. equality. In par- 
ticular, W = W. 

Proof. Recall first that L 1 (Q, F, fi) is a Banach space, and thus a complete 
metric space. In many cases, L l (Q.,F,[i) is separable (for example if = 
[0, 1] or another Borel space); however, there are cases when L 1 (i7, F, fj,) is 
non-separable, see Appendix [Gj and in order to be completely general, we 
have to include some technical details on separability below; these can be 
ignored when O is a Borel space (and at the first reading). 

Recall also that if B is a Banach space, then L 1 (f2, F, [i; B) is the Banach 
space of functions / : ft — )• B that are measurable and essentially separably 
valued, i.e., there exists a separable subspace B\ C B such that f(x) G B\ 
for a.e. x, and further J n d^i < oo, see e.g. [25j, Chapter III, in particular 
Section III. 6] or the summary in [371 Appendix C]. (Note that [25|] uses a 
definition of measurability which implicitly includes essential separability, 
see (25I . Lemma III. 6. 9].) 

Returning to our setting, we remarked above that ipw : Q — > L 1 (Q, F, fi) 
is measurable; furthermore it is bounded since [hKff)[| r,i = IIW^Hl 1 ^ 1> 
and a monotone class argument (again using e.g. [371 . Theorem A.l]) shows 
that i^w is separably valued; thus ipw 6 L l {p,,F,^;L l {Q.,F,^ij). In fact, 



see [25j, 111.11.16-17], the mapping W H> ipw extends to L l (VL x $7, /1 x fi), 
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and more generally to L 1 (Qi x Cl2,F\ X ^2,^1 x ^2) for a product of any 
two probability spaces (or, more generally, cr-finite measure spaces), and this 
yields an isometric isomorphism 

L 1 ^! x 2 , Jj. x T2,m x ^2) = L 1 (fix, Ti, fix; L 1 F2, ■ (9.4) 

As just said, tpw is separably valued, i.e., there exists a separable subspace 
B\ C B := L (CI, J 7 , /x) such that ij)w(x) G B\ for all x £ Cl. We may replace 
B\ by B\, and we may thus assume that B\ is a closed subspace of B, and 
thus a Banach space. Then ^w(B \ B\) = fifyw (B \ Bi)) = /x(0) = 0, and 
it follows from (|9.3p that = supp(fiw) ^ 7?i and, more precisely, 

n w : = {/ G Bt : fJ>w(U) > for every open U Q B 1 with / G 17}. (9.5) 
Let „4 be the family of all open subsets U of B\ such that fj,w(U) = 0. Then 
(|9.5p shows that fi^ = Bi\ Ufje-A un ion U{7e^4 ^ * s °P en ! so this 

shows that fiyK is a closed subset of B\, and thus a complete separable metric 
space as asserted. Moreover, since B\ is separable, this union equals the 
union of some countable subfamily; hence Hw{\JueA ^7) = and Hw(Clw) = 
1, so hw is a probability measure on Clw- 

By the definition of fiw, Hw(Clw) = fJ-{x : ipw( x ) £ ^iy}> so this also 
shows that tpw(x) G f2v^ f° r /x-a.e. x. Thus we can modify -014/ on a null 
set in f2 so that : CI — > Clw, and then tpw is measure-preserving by the 
definition of \iw- 



This proves (i) Next, if U is an open subset of Clw with /iw(U) = 0, 



then U = V H Clw fo r some open V C B\. Then zijy(V) = /j, w (U) = 0, and 



thus V £ A, so V Q B\\ Clw and [/ = Vn £2vf = 0, which proves (ii) 



If U C Clw is open and nonempty, then fiyy(U) > by (ii) and thus 



^ w l (U) / by ([93]) : hence £/ n V'(^w) / 0, which shows pi) 



The set Q := {/ G L (Cl,F,n) : < / < 1 a.e.} is a closed subset of 
L 1 (0,J 7 , /x). Since W^y) = W0&j2/) G [0,1] for every x and y, it follows 



that tp(x) G Q for every x, and (iii) implies (iv) 



To show (v) , let Tw C J 7 be the cr-field on f2 induced by tpw, i.e., 

J^- : = {%l) w l (A) : A C ^(0,7, /i) is measurable}. (9.6) 

By definition, ipw is measurable (0,7V,/i) — > L 1 (r2, J 7 , /x), so we can 
regard as an element of, using (|9.4p . 

7 1 (O, Tw,li\ L l (Cl, J 7 , /x)) l7(ft x O, T w x J 7 ,^ x /x). (9.7) 

This shows the existence of W\ G L l (Cl x fi, TV x J 7 , /x x /x) such that 
Wi = a.e. Consequently, the conditional expectation 

E(W Jm/xJ) = E(Wi I TV x J 7 ) = Wi = W a.e. 
By symmetry, also E(W | T 7 x 7"V) = a.e., and thus 

E(W I TVc x J%) = E(E(W I T w x J 7 ) I J 7 x TV) = W a.e. 
Hence, = W2 a.e., where W2 : Cl 2 — > [0,1] is T 7 ^ x TV-measurable, 
which implies that W2 = W^ w for some measurable W' : — y [0,1]; we 
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can symmetrize W to obtain the desired graphon W(x,y) := (W(x,y) + 
W(y,x))/2. _ _ 

If Wi : fl w -> [0, 1] is another graphon such that w = W = 
/x x //-a.e., then Wi = VF //vk x /iw-a.e., by the definitions of pull-back and 
fJ-w- 

Finally. W = VF since W is a.e. equal to a pull-back of W. □ 

Remark 9.2. Sine 0^/ is a complete separable metric space, the probability 
space (QwiHw) is a Borel space, see Appendix IA.21 

Following [Hit], but using our notations, we make the following definition: 

Definition 9.3. A graphon W on £1 is pure if the mapping ipw is a bijection 

Note that ipyy is injective VF is twinfree (see Section [5]). It follows 

easily that a graphon is pure if and only if it is twinfree and the metric 
r(x,y) := ||W(sc, •) — W(y, -JIU 1 on ^ is complete, and further fi has full 
support in the metric space (f2, r). (Then, automatically, (CI, r) is separable.) 
See further [Hll ]. 

Remark 9.4. Let W, W be graphons on the same probability space Cl 
with W' = VF a.e. Then W x (y) = W' x (y) for a.e. y, for a.e. x; in other 
words, ^w(a;) = ipw(x) for a.e. x. Consequently, fj,yy = Mw'j an d thus also 

fl w = fi w >. We have W'^ W = W ,fw ' = W = W a.e., and thus W' = W 
a.e. by the uniqueness statement in Theorem 19. 1[ 

Note that if W is pure and W = W a.e., then W is not necessarily pure; 
however, W' is pure if /j,{y : W(x,y) ^ W'(x,y)} = for every x (and not 
just for almost every x). 

We let W denote the graphon constructed in Theorem 19.11 Note that W 
is defined only up to a.e. equivalence, so we have some freedom in choosing 
W. We will show (Lemma 19. 6p that there is a choice of W that is a pure 
graphon. 

Lemma 9.5. Let W\ and Wi be two graphons defined on probability spaces 
(f2i,/ii) and (0,2, respectively, and <p : fli — > Q2 is a measure-preserving 
mapping such that W\ = a.e. Then the pull-back map (p* : f \-t f v is an 
isometric measure-preserving bijection of Clwi onto Clwu an d = W2 

a.e. 

Proof. By Remark 19.41 we may replace W\ by W% and thus assume W\ = 
W2 everywhere and not just a.e. 

Since ip is measure-preserving, ip* is an isometric injection L 1 (Cl2, H2) — > 

/•!<>..,/;!. 

If x € Cli, then the composition <p* o tpw 2 o tp maps x £ fii to 
(p* (ip W2 ((p(x))) = (p*(W 2 , v {x)) = {W 2 ^ X )Y, 
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which is the mapping 

x' ^ {W 2Mx) ) lp (x') = W 2 {<p(x),<p(x')) = W?(x,x') = W 1 (x,x > ) = W ltX {x'), 
and thus 

tftywMx))) = W 1>x = ^ Wl (x). (9.8) 

In other words, (p* o ip W2 o (p = ipwi- Since tpw l , ipw 2 aim V 9 are measure- 
preserving, it follows that for any A C 

n W2 {(^r i (A)) = ^ w \{{^r i {A))) = ^(^(^((/rHA)))) 

= f x 1 (^\(A)) = n Wl (A). (9.9) 

Hence, <p* : ^(0,2,^2) ^ 1 (^i,/^i) is measure-preserving. 

In particular, (|9.9p shows that nw 2 ((9 9 *) _1 (^H / i)) = MWi(^Wi) = 1- 
Moreover, (c^*)" 1 ^^) is closed in L (p,2, ^2) since flwi is closed in L (fii, /xi) 
and is continuous, and thus it follows, see (|9.3p . that 

Oiy 2 = SUpp^v^) C (^*) _1 (JVi)- 

In other words, <p* : £lw 2 ~^ ^Wi- 

Next, since (p* is an isometry and £lw 2 is a complete metric space (by 
Theorem l9.]](i)[ ), (p*(Q\y 2 ) is a complete subset of the metric space flwn and 
thus tp*(VL W2 ) is closed. By (|9.9p . 

Atwi(^*(^Wa)) = ^((^"H^^wb))) =^vy 2 (^w 2 ) = 1- 

Thus by (|9.3p again (or by Theorem I9.1|(ii)[ ) , 

VL Wl = supp^^/J C ip*(Q w A. 

Hence <p* is a bijection fiy[/ 2 — £lw\ ■ 
Finally, by (|9.8p . a.e. on Oi x 

((Wf)^f = (t?if *°^ ov = (Wi)^ 1 = Wt= (w 2 f, 

and thus (W-f )^ W2 = W2 a.e. on Q 2 X ^2- Consequently, PFf = VF2 a.e. 
by the uniquness statement in Theorem 19. l](v)| □ 

Lemma 9.6. For any graphon W, W in Theorem \9.1\ can be chosen to be 
a pure graphon on 

Proof. Let W be defined on f2. The construction in Theorem 19.11 yields the 
graphon W defined on C L 1 (0, F, fi). We repeat the construnction, 

starting with W on Q*w, an d obtain the graphon W on ^^7, where tt^ C 

L 1 , f^w) ■ Since : ^ - ► ^vy is measure-preserving and W^ w = 
W a.e. by Theorem 19.11 it follows by Lemma 19.51 that ip^y is an isometric 
bijection of onto Vty/', thus (V^) -1 is a bijection — > ^jy- 

We will show that we can modify W on a null set so that ip^ = (V'ly) 1 - 
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For a.e. x G f2, we have ipw(x) G £lw ^ ^(QjF, fi) and then ip^(ipw(x)) 
is by f)9. 1|) the function in L (Ojy, [lyy) given by 

V^Ww (#))(#) = W(ipw(x),g), g&^w- 

Consequently, the pull-back map ip^ in Lemma [^31 maps this to the function 
on Q given by, for a.e. x and y, 

i J w(i'w^w(x)))(y) = i>y^{^w(x)){ipw(y)) =w(ifj w (%),i>w(y)) 

= W^(x,y) = W(x,y) = ^w(x){y); 

thus V'wC^wC^wK 37 ))) = ?pw{x) for a.e. x G 0. 
Let 

^={/e%: $H^M/)) = /}■ 
We have just shown that /j,{x : ipyy(x) G ^4} = 1, so by (|9.2p /jlw(A) = 1. 
Thus, V'ilVW'yj/) : - >• equals the identity map a.e. 

Since i^w ls a bijection, ip^ = (V>^/) _1 oni C Q^. The idea is to modify 
ip^ on the null set Q\y \ A such that this equality holds everywhere. The 
space is included in a separable subspace B\ C Mvf)> an d by 

Lemma lG.ll there exists a measurable evaluation map <J> : £?i x fipp — )• R 
such that $>(F,g) = F(g) for every F £ Bi and ^vy-a.e. 5 £ ^w- Define 

H(f,g) :=cD((^)- 1 (/), 9 ), /,(?€%; 

then if : 0^ x fij^ — )• R is measurable and for every / G Qw-> 

H(f,g) = (r w )~ 1 (f)(g), for a.e. <? G (9.10) 



For every / G Sl w , (if)^)~ 1 (f) G fi^, so by (|9TT0D and Theorem EZjiv 



< H(f,g) < 1 for a.e. g G Let #(/,<?) := min{max{H(/, g), 0}, l} G 
[0, 1]. Then, for every / G Sl w , by (I9TTU]) . 

H(f,g)=H(f,g) = (r w )- 1 (f)(g), for a.e. (9.11) 

Thus if has the desired sections. We define a graphon W\ on fijy by 

fW(/, 5 ), f,geA; 

H(f,g), f^A,geA; 



Wfay) :- 



_ (9.12) 
H(g,f), feA,g£A; 

0, f,g$A. 



Then Wi = W a.e., because fi(A) = 1, so we may replace W by W\ in 
Theorem 19. l|(v) | Moreover, if / G A then W±(f, g) = W(f,g) for a.e. g, and 

thus ^ i (/) = ^j/) = (^r 1 (/)- 

lift A, then (/,<?) = #(/,(/) = (^y 1 (f)(9) for a.e. 5 by (HZ} 
and (|9.1ip . and thus ip^(f) = (ip^ v )~ 1 (f) in this case too. 

Consequently, ip^ = (V'Jy) -1 is a bijection - ► = j where the 
final equality is by Remark 19.41 □ 
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Theorem 9.7. Two graphons W\ and W2 are equivalent if and only if W\ is 
an a.e. rearrangement of W2 by a measure-preserving bisection 
that further can be taken to be an isometry. 

In other words, W\ — W2 if and only if there is an isometric measure- 
preserving bijection cp : Q^i — > ^w 2 such that = W\ a.e. 

Proof. Consider the class W m of all graphons that are defined on a proba- 
bility space that is also a metric space. Define W\ = W2 if Wi, W2 G W m 
and W\ is a.e. equal to a rearrangement of W2 by an isometric measure- 
preserving bijection; this is an equivalence relation on W m . Lemma 19.51 
shows that if W\ is a pullback of W2 , then W\ = W2 ■ 

If W\ = W2, then Theorem 18.31 yields a chain of pullbacks linking W\ and 
W 2 , and thus Wi = W 2 - 

Conversely, if W\ equals a rearrangement of W2 a.e., then W\ = W% 



W 2 = W 2 by Theorem GO (v) □ 



Corollary 9.8. // W\ and W2 are equivalent graphons, then £lw\ an d 
are isometric metric spaces. □ 

Theorem 9.9. Every graphon is equivalent to a pure graphon. Two pure 
graphons W\ and W2 are equivalent if and only if they are a.e. rearrange- 
ments of each other. 

Proof. If W is a graphon, then W = W for a pure graphon W by Theo- 
rem EEjj|v)] and Lemma 19.61 

If W\ is pure, then is a bijection so W\ is an a.e. rearrangement of 
Wi by Theorem [9TTj^v)| The same applies to W2, and if further W\ = W 2 , 
then W\ = W2 and Theorem 19.71 yields that W\ is an a.e. rearrangement of 
W2- Since being an a.e. rearrangement is an equivalence relation, this shows 
that W\ is an a.e. rearrangement of and conversely. □ 



Proof of Theorem \8J\ Suppose that W\ = W 2 . By Theorem E21 W\ is^an 
a.e. rearrangement of Wjj. Further, W\ is a.e. equal to a pull-back of W\ 
by Theorem 19. 1\ so by composition, W\ is a.e. equal to a pull-back of W2, 
and so is W2 by Theorem 19.11 again. This proves Theorem 18.41 (except the 
last sentence, which was shown in Section [8]) for graphons, which suffices as 
remarked earlier. Note that W2 is defined on Oiy 2 , which by Remark 19.21 is 
a Borel space. □ 

By Corollary 19.81 every graph limit T, i.e. every element of the quo- 
tient space W, defines a complete separable metric space Qw by taking any 
graphon W that represents T; this metric space is uniquely defined up to 
isometry. Hence metric and topological properties of £lw are invariants of 
graph limits. See Lovasz and Szegedy [5l|] for some relations between such 
properties of £lw and combinatorial properties of the graph limit; it would 
be interesting to find further such results. 
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Example 9.10. A trivial example is that a graph limit is of finite type, i.e. 
it can be represented by a step graphon, if and only if Uw is a finite set, see 
Example 15.31 Theorems 19.71 and 19.11 imply that every graphon equivalent to 
a step graphon is a.e. equal to a step graphon. 

Theorem 19.91 shows that we can regard pure graphons as the canonical 
choices among all graphons representing a given graph limit. By considering 
only pure graphons, equivalence boils down to a.e. equality and rearrange- 
ments, and every graphon W has a pure version constructed as W. This is 
theoretically pleasing. (Nevertheless, for many applications it is more con- 
venient to use other graphons, for example defined on [0,1], regardless of 
whether they are pure or not.) 

Remark 9.11. In this section, we have used mappings into L 1 (0,/i) and 
have constructed Qw a s a subset of L 1 (f2,/u). We could just as well use 



L (O, yii), or L p (Q,fi) for any p G [l,oo). In fact, by Theorem 19. l](iv) £lw C 



L p (£l,fi) and the different L p -metrics are equivalent on Vty/ by Holder's 
inequality; it follows easily that the construction above yields the same space 
flw for an Y LP with 1 < p < oo, with a different but equivalent metric and 
thus the same topology. 

9.1. The weak topology on $7^- Since Vty/ C L 2 (Q,, fj,), the inner product 
(/> 9) := In fd d// is defined and continuous on VLy/ x We define further, 
again following Lovasz and Szegedy jBll ] in principle but not in all details, 



rwow(f,9)-= \(f - g,h)\ dnw(h), f,gefl w . (9.13) 

Since ipyy : £1 — > Q<w is measure-preserving, and ipw(x) = W x , this can also 
be written 

r W ow(f,g) = [ \{f-g,W x )\ d/i(ac). (9.14) 
Jn 

Since (/, g) is continuous and bounded on fijy x Vtw, it follows by dominated 
convergence that rwo\v(f, g) is continuous on Qyy x Qyy. We will soon see 
(in Theorem 19. 13[) that it is a metric. We let ryy denote the original metric 
on £lw-> i- e - the L 1 -norm: 

rw(f,g)--=\\f-g\\ L i--= [ \f(x)-g(x)\d f ,(x). (9.15) 

Jn 

We have < W x < 1 and thus |(/ - g,W x }\ < \\f - g\\ L i, so by (l9T4|) and 



rwow(f,g) < rw{f,g)- 



(9.16) 
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Remark 9.12. If W is pure, so ipw is a bijection, then rwoW induces a 
metric on Q which we also denoted by rwoW', explicitly, 

r WoW {x,y) :=r WoW (W x ,W y ) = / \{W X - W y , W z )\ d[i(z) 

Jn 



(W(x,u) - W(y,u))W(z,u) d/i(u) 



\Wo W(x,z) -WoW(y,z)\dfi(z) 



\WoWM-WoW{y,-)\\u M 



dn(z) 



where W o W(x, y) := j n W(x, u)W(u, y) d/i(u). Thus, if T\y is the integral 
operator with kernel W, then W o W is the kernel of the integral operator 
T\\r o Tw, which explains the notation. 

Recall that the weak topology a = ol°° on L 1 (r2,/x) is the topology gen- 
erated by the linear functionals / h-> (/, h) = Jq fh d/x for h G L°°(f2, fj,). In 
general, if X and Y are two subsets of L 1 ^) such that \fh\ < oo when 
/ G X and h EY, let (X, cry) denote X with the weak topology generated 
by the linear functionals / i— > (f,h), h G Y. Since the elements of Op^ 
are uniformly bounded functions by Theorem 19. l](iv) it is well-known, see 



Lemma fF.ll fi). that the weak topology on also is generated by / i— > (/, h) 
for h G (this is the weak* topology on L°°(J7,/i) restricted to fiw)> 

or by the subset h G L p (f2,/i) (this is the weak topology on L q (Q,n), where 
1/p + 1/q = 1, restricted to the subset fi^, cf. Remark 19. lip . Thus, 



(^W)O") = (^Wj 0*2,00) = (^WjCTx 1 ) = (^W,0"L 2 " 



(9.17) 



We let Vlw" be the closure of VLw hi L l {VL^) in the weak topology. It 
follows by Theorem EZjiv) | that TL^ a C {/ G L 1 (J7, //) : < / < 1 a.e.}, 
and thus, by Lemma IF. H i) again. 



(tow ,o-) = (^w ,c r L°°) = (^iy ,o-^) = (Pw ,o- L 2) 



(9.18) 



Moreover, the weak closure Qw is the same as the weak closure of Qyy i n 
L p (Q,fi) for any p < oo, and the weak* closure in L°°(f2,/x). 

Recall also that two metrics r% and r2 on the same space are equivalent 
if they induce the same topology, i.e., if ri(x n ,x) — > <^=^ r2(x n ,x) — > 
for any point x and sequence (x n ) in the space; the metrics are uniformly 
equivalent if r\(x n ,y n ) — > -4=>- r2(x n ,y n ) —> for any sequences (x n ) and 

{Vn) - , n 

Lovasz and Szegedy [51| showed essentially the following. 
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Theorem 9.13. (i) rwow is a metric on Qyy and it defines the weak 
topology a on ttyy. The same holds on the weak closure Vtw ■ 

(ii) The metric space (£lw ,r\vow) is compact. Thus (£lw ,ry/o\v) is 
the completion of (fijy, rwo\v)- In particular, £l\y = Q\y if and only if 
(Qw, r Wow) is complete. 

(iii) The inequality ryy > ry/oW holds, and thus the identity mapping 
(£lwi r w) {^w-, r Wow) is uniformly continuous. 

(iv) Qyy is compact if and only if the metrics r\y and r-^ow are equiv- 
alent on £lw and further Vtw is weakly closed, Vt\y = Qyy- 

(v) The metrics rw and rwoW are uniformly equivalent on £lw if and 
only if Qw is compact for the norm topology given by rw- 

It seems more difficult to characterize when rw and rwoW are equivalent 
on tlwi see Examples 19. 16f497TT1 below. 

Before proving the theorem, we introduce more notation. Using the fact 
that Clw C L 2 (f2,/u) (cf. Remark I9.1ip . let A be the closed linear span of 
Qw in -^ 2 (^)M)) an d let B be the unit ball of A; thus Qw C B. We extend 
the definition (|9.13p of ry/oW to all f,g G A. 

Lemma 9.14. (i) rwoW is a, metric on A. 

(ii) The metric rwoW defines the weak topology a 12 on B. In other words, 
(B,rwow) = (-S)<7l 2 ) as topological spaces. 

(iii) The metric space (B,rwo\v) is compact. 

Proof, (i): Symmetry and the triangle inequality are immediate from the 
definition ()9.13p . Suppose that rwo\v(fi 9) = for some f,g G A. Since 
h t-> \(f — g,h)\ is continuous on Clwi and its integral rwow{f, 9) is 0, it 
follows from Theorem 19. l](ii) | that (f — g,h) = for every h G flyy. The set 
{h G L 2 : (f — g, h) = 0} is a closed linear subspace of L 2 (Q, fj,), and thus it 
contains A; i.e. (/ — g,h) = for every h € A. In particular, 



\f-g\ 2 d f i = (f-g,f-g)=0. 



\.i - .'/'" 

in 

Thus / — g = a.e., i.e. / = g in A C L 2 . Hence rwoW is a metric. 

(ii): If h G L 2 (f2) and h% G A is the orthogonal projection of h, then 
(/, h) = (f, h-2) for every / G A. Consequently, a L 2 = a a on A. 

Let D be a countable dense subset of £lw] then D is total in A, and thus 
A is a separable Hilbert space. It is a standard fact that the unit ball B of 
A with the weak topology 17,4 = a^2 then is a compact metric space. (It is 



compact by the Banach-Alaoglu theorem 25|, Theorem V.4.2], and metric 



by |25l . Theorem V.5.1]. Explicitly, a a = o~d on B, by the same argument as 
in the proof of Lemma lF,l( and if D = {hi, hi, ■ ■ ■ }, we can define a metric 
on (B,a L2 ) = (B,a A ) by d(f,g) := Ei^Kf ~ 9,hi)\.) 

We next show that the identity map (B, 0^2) — > (B, rwow) is continuous. 
Since, as just shown, (B,(Tjji) is metrizable, it suffices to consider sequential 
continuity. Thus assume that f n , f G B and f n — > f in a 1,2. Then (/„ — 



GRAPHONS, CUT NORM AND DISTANCE 



39 



/, h) — > as n — > oo for every h E Q\y C L 2 , and thus ry/ow{fn, f) — > by 
(|9.13p and dominated convergence. 

The identity map (B,a L i) — > (B,ryy w) is thus a continuous bijection 
of a compact space onto Hausdorff space, and it is thus a homeomorphism. 
Consequently, (B,a L 2) = (B,rwow)- 

(iii): A consequence of (ii) and its proof, where we showed that (B, 0*1,2) 
is compact. □ 



Proof of Theorem \9.13[ (i) Since Qyy C B C A, it follows by Lemma 19.141 
that Qw Q B. Hence, using Lemma [9.141 again and (|9.18p . ry/o\y is a 
metric on Q w and (fl w ,a) = (Q w ,a L2 ) = ,r WoW ). 



ii) An immediate consequence of Lemma 19.141 to geth er with standard 



facts on compact and complete metric spaces (see e.g. [271 . Section 4.3]). 
" This is just (|9TTB) . 

If (Qw, r w) is compact, then the identity mapping (£ly/,ry/) 



hi 



IV 



{£lWi r Wow), which is continuous by (iii) , is a homeomorphism. The metrics 
are thus equivalent. Furthermore, (J)w,<t) = {£lw, r Wow) is compact and 
thus closed in the weak topology on L 1 (Q,^). 

Conversely, if Qw = &w > then (Qw, r Wow) is compact by (ii) and if 



further the metrics are equivalent, then (£ly/,ry/) is compact too. 



(v) If {VLy/^w) is compact, then the metrics rw and ry/o\v on Vty/ are 

Moreover, as is easily seen (e.g. [271 . 



IV 



equivalent as seen in the proof of 
Theorem 4.3.32]), two equivalent metrics on a compact metric space are 
uniformly equivalent. 

Conversely, if rw and rwoW & r e uniformly equivalent, then (Qy/,ry/ Q y/) 
is a complete metric space, since (Slwi r w) is by Theorem l9-H hence Qy/ = 
by (ii) , and thus fly/ is compact by (ii) again. □ 



The following analogue of Corollary 19.81 shows that also the metric space 
(Qy/,ry/o\v) an d its completion, the compact metric space (Qy/ ,ry/ y/), 
are invariants of graph limits. 



Theorem 9.15. // W\ and W2 are equivalent graphons, then (ytwi> r WioWi) 
and (£lw 2 i r W 2 oW<i) are isometric metric spaces, and so are the compact met- 
ric spaces (f2yi/ 1 ' J , rwioWi) and (^w 2 a : r w 2 oW 2 )- 



Proof. By Theorem 18.31 it suffices to prove this in the case when W\ is a 
pull-back of W2 as in Lemma 19.51 In this case, for any f,g£ &w 2 , using 
(19. 13ft and the fact that / 1— > f v is a measure-preserving bijection of Q\v 2 
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onto &Wi by Lemma 1931 

(fP-gf,k^dfi W2 (k) 



Qw 2 



(f -9,k) dn W2 (k) 
= rw 2 oW 2 (/i 9); 

thus the bijection / i->- f v is an isometry also (£2w2> r W2°W 2 ) — ^ (^Wi ; r WioWi) 
This extends to an isometric bijection of (£ly/ 2 > r vy 2 oW 2 ) onto (^v^i , ^VFioWi) 
by Theorem 19. 13|(ii)j □ 

We say that a graphon W is compact if is a compact metric space with 
the standard L 1 metric r^y, and weakly compact if (Slpi/,<j) = (f2yK> r Wow) is 
compact. By Corollary 19.81 and Theorem [9T5J the same then holds for every 
equivalent graphon, so we may say that a graph limit is [weakly] compact if 
some, and thus any, representing graphon is [weakly] compact. 

Not every graphon is compact. Moreover, this can happen both with 
(Qw,o~) compact and (Q\y,o~) non-compact, as shown by the following ex- 
amples (inspired by a similar example in [Hll ] ) . Note that exactly one of the 
two conditions in Theorem 19. 13|(iv) fails in each of the two examples. 



Example 9.16. Let ft := {0, 1} 00 = {x = (x;)g° : x; G {0, 1}} (the Cantor 
cube, which is homeomorphic to the Cantor set) with the product measure 
/i := where ^{0} = = 1/2. We write S7 = CIq U Qi, where 

Qj := {1 £ !] : io = j}- Note that there is a measure-preserving map 
[0, 1] — > O given by the binary expansion, so the examples below can be 
translated to examples on [0, 1] by taking pull-backs. 

If F is a function $7o x ^1 ~~ * [ — 1> 1]) we define a graphon W on by 

I + ~F{x,y), i£flo)!/e% 
I + |F(y,x), x G Oi, y G Oq; 



^3, x,y G O or x,y G Oi. 



W(x,y) := < 

Define ^(y) = F(x,y) for x G O , y G f^i and F x (y) = F(y,x) for 
x G fii, y G Oo; thus F a . G L 1 (fii) for x G Oo and F s G L 1 (ilo) for x G 

Regard L 1 (^o) and L 1 (Oi) as subspaces of L 1 ^) in the obvious way 
(extending functions by 0). Define the maps $0 : ^0 — > ^(Qi) and $1 : 
tti -> -^H^o) by <&o0*0 = and $i(x) = F^; then 

iP w (x) = W x = i + \$j{x), forxG%. (9.19) 

Let Hj be the push-forward of /i by <!>.,•; this is a measure on L 1 (r2i_j) C 
L 1 (il) with total mass 1/2. Let X,- C L 1 (rii_ J ) C L l (Q) be the support of 
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It follows from (19. 19f) that the map / i— > ^ + \j is measure-preserving 
(L 1 (r2), hq + Hi) — > {L 1 ^), Hw)i an d thus 

n w = {^ + lf:feX UX 1 }. (9.20) 

Define the functions hi : Ox — >• {— 1, 1} by hi(x) = 2xi — 1, where x \— > ar, 
is the i:th coordinate function. Let £(x) := inf{z : Xi = 1} (defined a.e. on 
0) and take 

F(x,y) := h e(x) (y). (9.21) 

(Thus VF(x,y) = y^ x) for x G Oo, y £ Hi-) 

Then : x G Oo} = {hi : i > 1}. The induced measure Ho on L x (Oi) is 
thus a discrete measure with atoms hi (each with positive measure), so 

X = supp/Uo = {hi : i > 1} = {hi : i > 1}, 

since {hi} is closed in L 1 (Oi) because ||/tj — hj\\ L i^Q^ = 1/2 when i / j. 
If y, z G Oi, then 

„ oo 

II^-^IIli= / |i ? (x,y)-F(x,z)|d/i(x) = V2- i - 1 |y i -z i |. 
./no i=1 

It is easily seen that this is a metric on Oi which defines the product topol- 
ogy. Hence <3?i : y t— > F y is a homeomorphism of Oi onto {F y : y G Oi} C 
L 1 (Oo), and consequently {F^ : y G Oi} is a compact subset of L . Since 
further <&\ : (Oi,/i) — > (L 1 (0),//i) is measure-preserving, and h nas Ml 
support on Oi, it follows that X\ = supp/ii = {F y : y G Oi} = Oi = O 
(where = denotes homeomorphisms.) Note also that Xq and X\ are disjoint; 
in fact, they have distance 1 in L . 

It follows that Oyj/ — Xq U Xi = NU O, i.e., Ojy is homeomeorphic to the 
disjoint union of the Cantor cube (or Cantor set) and a sequence of discrete 
points. Thus Qw is not compact. 

With the weak topology a, we have (X\,a) = (X\,r\y) because (X\,rw) 
is compact. Moreover, the sequence (hi) is orthonormal in L 1 (Oi,2/m) (for 
convenience normalizing the measure on Oi), and thus hi — > weakly 
in L 2 as i — > oo. It follows by Lemma l9,14f ii) that rwo\v(hi,0) — > 0. 
For the corresponding elements gi := \ + hhi G O, see (|9.20j) . we have 
rwo\v(9i, |) — ► 0. It follows that (Ow,<t) consists of a compact set homeo- 
morpic to O, and a sequence (<?i) converging to ^. Since ^ ^ O^, it follows 
that (Q\Y,cr) = (p,w,Twow) is n °t compact; moreover, the identity map 
(Qw, r w) (^W : r Wow) is a homeomorphism, so (Ovy,cx) = (J)w,rw) = 
N U O. Thus r\Y and rwoW ar e equivalent on O^ but not uniformly equiv- 
alent. (Just as {1, 2, ... } and {1, 1/2, 1/3, . . . }, both with the usual metric 
on M, are equivalent but not uniformly so.) 

The weak closure Vlw = On/ U {|} is the one-point compactification of 
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Example 9.17. We modify the preceding example by taking, instead of 

I 0, if x-i = = 1, 

F(x, y) := I ' ' (9.22) 

[ frf (*) (y) otherwise. 

The only significant difference from the preceding example is that now Xq 
also contains the function 0, and £lw thus the function |; note that hi — > 
weakly and thus in r\y \y but not in r\y. In the norm topology. Xq = 
{di} U {5} is still an infinite discrete set, and thus Q\y = Xq U = N U Vt 
as in Example 19. 161 (We have added one isolated point to Qw) 

In the weak topology, however, Xq now consists of a convergent sequence 
and its limit point, and thus (Xq,o~) is compact and homeomorphic to the 
one-point compactification N of N (or, equivalently, to {1/n : n G N} U {0} 
with the usual topology). Thus (£lw, °~) — XqU Xi = NUO. (Compared to 
Example 19.161 we have added the point at infinity in the one-point compact- 
ification.) In particular, (Q]y,rwow) = (&w,o~) is compact but ({l\y,rw) 
is not, and the two topologies are different so the metrics are not equivalent. 
The weak closure £lw = &w- 



10. Random-free graphons 

Lovasz and Szegedy [Elj] have studied the class of graph limits represented 
by {0, l}-valued graphons (and the corresponding graph properties); with a 
slight variation of their terminology we call such graphons and graph limits 
random-free (a reason for the name is given in Remark ID.2|) : 

Definition 10.1. A random-free graphon is a graphon W with values in 
{0,1} a.e. 

By Corollary 18.121 every graphon equivalent to a random-free graphon 
is random-free. Note that every graphon Wq defined by a graph as in 
Example 12.71 is random-free. (A reason for the name random- free is given in 
Remark El) 

Example 10.2. It is shown by Diaconis, Holmes and Janson [22| that every 
graph limit that is a limit of a sequence of threshold graphs can be repre- 
sented by a graphon that is random-free (and has a monotonicity property, 
studied further in [HI]). Hence every representing graphon is random- free, 
i.e., if G n are threshold graphs and W is a graphon such that G n — > W, 
then W is random-free. 

Example 10.3. It is shown by Diaconis, Holmes and Janson [231 ] that every 
graph limit that is a limit of a sequence of interval graphs can be represented 
by the graphon W(x,y) := l{x n y / 0} on the space f2 := {[a, b] : < 
a < b < 1} of all closed subintervals of [0,1], equipped with some Borel 
probability measure [i. (Note that Q and W are fixed, but [i varies.) Hence 
every graphon representing an interval graph limit is random-free. (This 
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includes the threshold graph limits in Example ll0.2l as a subset. The explicit 
representations in [13] and [23I ] are different, however.) 

Lemma 10.4. Let W be a graphon. Then the following are equivalent. 

(i) W is random-free. 

(ii) j n2 W(l-W) = 0. 

(iii) Isfl = J Q2 W. 

Proof. This is trivial, noting that W is random-free if and only if W(l— W) = 
a.e., and that W(l — W) > for every graphon. □ 

Recall that W >-)• J Q2 W 2 is not continuous for <$□, see Example IC.31 we 
therefore cannot conclude that the set of random-free graphons is closed. In 
fact, it is not; on the contrary, this set is dense in the space of all graphons. 

Lemma 10.5. The set of random-free graphons is dense in the space of all 
graphons. In other words, given any graphon W , on any probability space £1, 
there exists a sequence of random-free graphons W n such that 6a(W n , W) — > 
0. 

Proof. By Remark IB. 21 there exists a sequence (G n ) of graphs such that 
5n(Wc n ,W) — > 0. Each Wc n is random-free. □ 

In contrast, the set is closed in the stronger metric 5\. 

Lemma 10.6. The set of random-free graphons is closed in the space of all 
graphons equipped with the metric 5±. In other words, if W and W n are 
graphons, on any probability spaces, such that Si(W n ,W) — > 0, and every 
W n is random-free, then W is random-free. 

Proof. Let F(x) := x(l - x). Then F : [0,1] [0,1] and \F'(x)\ < 1 so 
\F(x) — F(y)\ < \x — y\ for x,y £ [0, 1]. It follows easily that if W n and W are 
graphons with S^W^W) 0, then 5 X (F (W n ) , F (W)) < 8 x (W n ,W) -± 
and I / F{W n ) - J F(W)\ 0. Since W n is random-free, / F(W n ) = by 
Lemma llO. 41 for each n, and thus J F(W) = 0. By Lemma llO. 41 again, this 
shows that W is random free. □ 

We continue to investigate the metric <5i in connection with random-free 
graphons. 

Lemma 10.7. Let W\ and W2 be graphons on a probability space £1, and 
let W[ be a random-free n-step graphon on the same space. Then 

\\W! - W 2 \\ L x m < n 2 ||^i - W 2 \\n + nw l - W{\\ L x m . (10.1) 
Proof. Let {^.j}] 1 be a partition of Q such that W[ is constant or 1 on each 

If W[ = on Ai x Aj, then 



\W{-W 2 \= W 2 < Wi + HWi-Walln 
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IIa, 



\Wx-W{\ + [|Wi — W 2 \\u- 
If W{ = 1 on i4j x Aj, then 



W{-W 2 \= (i-w 2 )< {l-W 1 ) + \\W 1 -W 2 \\o 

Ai xAj J J AiX Aj J J AiX Aj 

\Wi - W{\ + \\Wi - w 2 \\ n . 

iAj 

Thus, in both cases JJ A . xA . \W{-W 2 \ < n A . xA .\W 1 -W{\ + \\W 1 -W 2 \\ a , 
and summing over all i and j yields 

II Wj - W 2 \\ L i < \\Wi - W[\\ L i + n 2 || Wx - W 2 \\ n . 

The result follows by ||Wi - W 2 \\ L i < \\W t - W{\\ L i + \\W{ - W 2 \\ L i. □ 

Remark 10.8. In particular, if W\ is a random-free n-step graphon and 
W 2 an arbitrary graphon on the same probability space, then 

II Wi - W 2 \\ L i < n 2 ^ -W 2 || n . (10.2) 

The constant n 2 in Lemma ri0.7l and f 1 1 . 2 1) is good enough for our purposes, 
but it is not the best possible, and it may easily be improved. In fact, an 
inspection of the proof shows that if we let Oj 3 - := J A . xA . (W± — W 2 ), then we 



have simply estimated \aij\ < \\Wi — W 2 \\a and thus Ylij \ a ij\ ^ ri 2 1 1 Wi - 
W2 1 1 □ - To obtain a better estimate, we use an inequality by Littlewood [46| 
see also 0] and [H, §6.2], which yields 



< V3||Wi -W 2 \\ n ,2- 

(10.3) 



E(EM 2 ) 1 sup EE £i 4 a *i 

i j £i,e'j-±l i j 

Consequently, by the Cauchy-Schwarz inequality, 

n n n n . ,„ 

E E ^ E nl/2 (E M 2 ) ^ ^V^Wx - w 2 \\ u>2 , (10.4) 

1=1 j = l 1=1 jr' = l 

which shows that n 2 in (jlO.ip and f)10.2|) can be replaced by \/3n. Further- 
more, the constant \/3, which is implicit in [4(J, has been improved to y/2 
by Szarek [H3|. (Szarek actually proved that a/2 is the sharp constant in 
Khinchin's inequality, which implies Littlewood's, see p}. See also [12] for 
related results.) Consequently, n 2 in (110. ip and (110.2P can be replaced by 
V2n. 

This is, within a numerical constant, the best constant in these inequal- 
ities, as shown by the following examples which all yield a lower bound of 
order n 1 / 2 . 

Example 10.9. Let W be a symmetric Hadamard matrix of order n (i.e., 
a matrix with ±1 entries and all rows ortogonal); such matrices exists at 
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least if n = 2 k for some k. (Take tensor powers of (\ _i)-) We have 
W = W+ — W- where W± are graphons on [n]. (We equip [n] with the 
uniform probability distribution.) 

Then 1 1 1 1 = 1, since \W\ = 1. In order to estimate ||T^||n,2, let / and g 
be two functions [n] — > [—1,1], see the definition (14, 3j) . Write W = (n%)™_- =1 
and change notation to aj = f(i), bj = g(j); thus |aj|, \bj\ < 1. 

Since W is a Hadamard matrix, the normalized matrix n~ l l 2 W is orthog- 
onal, and is thus an isometry as an operator in IR n (with the usual Euclidean 
norm); hence, W has norm y/n. Consequently, 



„ n 
/ W(x, y)f{x)g{y) dfj,(x) du(y) = rT 2 V] aiw^b 



v 1/2 / n \ 1/2 



v i=l / \j=i 



Hence ||W||n,2 < n while ||W||^i = 1, and thus the best constant in 
(|10.2p is at least y/n for n such that a symmetric Hadamard matrix exists, 



and hence at least y/n/2 for any n. See further [46J and [59|, §6.3]. 

Example 10.10. Let q be a prime power with q = 1 (mod 4) and consider 
the Paley graph P 9 , see [1, Section 13.2]; the vertex set of P q is the finite 
field Fg and there is an edge xy if x — y is a square in F g . Let Wi := Wp and 
W2 = 1/2; then W\ is a random-free g-step graphon, and \\W\ — W2IU1 = 
1/2, since Wi — W2 = ±1/2 everywhere. By [a, Theorem 13.13] (and its 
proof, or Lemma fE. II below). W^Hn^ = 0(q~ 1 ^ 2 ). Hence, the constant 

in (|10.2p is at least fi(g 1//2 ), for n = q of this type. Since primes of the type 
4k + 1 are dense in the natural numbers, it follows again that the constant 
is Q(n 1 / 2 ) for all n. 

Example 10.11. We can use a random graph G = G(n, 1/2) and let W\ := 
W G and, again, W 2 := 1/2. Thus \\Wi - W 2 \\ L i = 1/2. (Note that the 
Payley graph in Example 110.101 is an example of a quasirandom graph, so 
the two examples are related.) 

We use for convenience the version || ||n,4 of the cutnorm in Appendix [El 
If 5, T C [n] are disjoint, then n 2 J SxT Wq is the number of edges between 
S and T, and has thus a binomial distribution Bi(st, 1/2) where s := \S\ 
and t := |T|. Hence, a Chernoff bound [4ll . Remark 2.5] shows that, for any 
c>0, 



(Wi - W 2 , 

SxT 



> cn- 1 ' 2 



[ (W G -EW G ) 
J SxT 



2(cn 3 / 2 ) 2 x / 2c 2 n 3 



> cn 3 / 2 



■£ 2cxp(- "~ st ' j <2exp(^-^y I ) -8c 2 n). 
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since st < s(n — s) < n 2 /4. There are 3 n pairs 5, T of disjoint subsets, and 
thus 

P(||Wi - W 2 ||n,4 > cra~ 1/2 ) < 2 • 3™exp(-8c 2 n) = 2exp((log3 - 8c 2 )n). 

Consequently, choosing for simplicity c = 1, so 8c 2 > log 3, with high prob- 
ability || Wi — W2||n,4 < n 1 / 2 , and thus by Lemma lK2l 

II Wi - W 2 \\d,i < 4n~ 1/2 = 8n" 1/2 ||iyi - W 2 \\ L i, 

showing that the best constant in ()10.2j) is at least \n 1 / 2 (for || ||n,i)- 

Lemma 10.12. Let W and Wi, W 2 , • • • be graphons on a probability space 
£1, and assume that W is random-free. Then \\W n — W\\n — >• as n —> oo if 
and only if \\W n — W^Hl^q 2 ) ~~ * 0- 

Proof. Assume \\W n — W\\u — > 0. W is the indicator 1a of a measurable 
set A C. n 2 . Any such set can be approximated in measure by a finite 
disjoint union of rectangle sets A{ x Bi, and we may assume that this set 
is symmetric since A is; in other words, given any e > 0, there exists a {0, 1}- 
valued step graphon W' such that \\W — W\\ L i < e. Let the corresponding 
partition have = N(e) parts. Lemma 110.71 then yields 

\\W - W n \\ L i < N 2 \\W - W n \\a + 2e^2e 

as n — > oo. Hence, limsup n _ i , 00 \\W — W n ||^i = 0. 

The converse is obvious. □ 

Lemma 10.13. Let W and W\, W 2 , ■ ■ ■ be graphons defined on some prob- 
ability spaces, and assume that W is random-free. Then 5o(W n ,W) — > as 
n — >■ oo if and only if Si(W n , W) — > 0. 

Proof. Assume that 5a(W,W n ) — > 0. By replacing the graphons by equiv- 
alent ones, we may by Theorem 17.11 assume that all graphons are defined 
on [0, 1]. By Theorem 16.91 we may then find measure-preserving bijections 
ip n : [0, 1] -> [0, 1] such that \\W - Wff"|| n < 5 n (W, W n ) + 1/n -»■ 0. Hence, 
W - Wr? n \\ L i {n2) -> by LemmadEH and thus 8x{W, W n ) -> 0. 

The converse is obvious. □ 

Theorem 10.14. Let W be a graphon. Then W is random-free if and only 
if 5i(Wc n , W) — > for some sequence of graphs G n . 

Proof. There exists a sequence of graphs G n with 5n(Wc n ,W) — > by 
Remark IB.2I If W is random- free, then 5i(Wg„, W) — > by Lemma [10. 131 
The converse follows by Lemma ll0,6( since each Wg„ is random- free. □ 

Theorem 10.15. Let W be a graphon. Then the following are equivalent. 

(i) W is random-free. 

(ii) f W 2 —7- f W 2 whenever (W n ) is a sequence of graphons such that 
5 a (W n ,W)^0. 

(iii) t(F,W n ) — > t(F, W) for every multigraph F whenever (W n ) is a 
sequence of graphons such that 5\j(W n ,W) — > 0. 
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Proof, (i) (iii) If W is random-free and <fo(W n , W) — > 0, then Lemma 



[TCTl yields 5i(W n , W) -> 0, and thus t(F, W n ) -»■ W) for every multi- 
graph i* 1 by Lemma IC.4[ 



Uu) 
pleO 



(ii) Immediate by taking F to be a double edge, see Exam- 



(ii) => (i) Take a sequence of graphs G n such that G n —> W, see 
Remark IB. 2( thus fa(W"G„,W / ) -> 0. Hence f Wg„ — > J W. Further, every 
Wg„ is {0, l}-valued, so Wq = Wg„; hence 

= [w Gn ^ [w. 



If p)1 holds, then also fW Gn -»■ /If 2 . Hence Jl^ 2 = /I^, so If is 
random-free by Lemma 110.41 □ 

Finally, we mention two characterizations of random-free graphons in 
terms of the finite or infinite random graph G(n, W) defined in Appendix [Pi 
First the finite case and entropy. 

Theorem 10.16. Let W be a graphon. Then W is random-free if and only 
if the entropy £(G(n, W)) = o(n 2 ) as n — >• oo. 

Proof. This is an immediate consequence of Theorem ID.51 since h > and 
thus the right-hand side of (ID.lj) vanishes if and only h(W(x,y)) = a.e., 
which is equivalent to W(x, y) £ {0, 1} a.e. □ 

Problem 10.17. We may, as in [3, (15.30)], ask for the exact growth rate of 
£(G(n, W)) for a random-free graphon W. It is easily seen that if W is a step 
graphon, then £(G(n,W)) = 0(n); we conjecture that the converse holds 
too. As another example, for the "half graphon" W(x, y) = l{x + y > 1} 
on [0, 1], it can be shown (e.g. using [22], Corollary 6.6]) that £{G{n, W)) = 
n log re + 0{n). 

We represent the infinite random graph G(oo, W) by the family of indi- 
cator variables := l{ij is an edge}, 1 < i < j < 00. We define the shell 
cr-field (or big tail cr-field) to be the intersection 

00 

S := p| a{Jij :i<j,j> n} (10.5) 
n=l 

of the cr-fields generated by all Jij where at least one index is "big" . Recall 
that a random variable is a.s. S -measurable (or essentially S -measurable) 
if it is a.s. equal to an 5-measurable variable; equivalently, it is measurable 
for the completion S of S. 

Theorem 10.18. The following are equivalent for a graphon W: 

(i) W is random-free. 

(ii) The infinite random graph G(oo, W) is a.s. S-measurable. 

(iii) The indicator J\2 := 1{12 is an edge in G(oo,W)} is a.s. S-mea- 
surable. 
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Proof. This is the symmetric version of [l|, Proposition 3.6], see also [2|, 
(14.15) and p. 133] and [HI, (4.9)]. Since the details for the symmetric case 
are not given in these references, we give some of them for completeness. 
First, note that we can write the definition of G(oo,W) in Appendix iDl 

as 

J ij = l{ti j <W(X i ,X j )}, (10.6) 

where for 1 < i < j, and Xi, for i > 1, all are independent, and Xi has 
distribution fi on f2 while £jj is uniform on [0, 1]. 

(i) =^ (iii): If W is random-free, then (|10.6p simplifies to = W(Xi, Xj). 
Consider the array ( J2i-i,2j)fj=i = (W(X2i-i, X2j)f 3 j =1 , where the first in- 
dex is odd and the second even; this is a separately exchangeable array, and 
by [H, Proposition 3.6] (or as a simple consequence of [i^, Proposition 7.31]), 
it is a.s. ^'-measurable for the shell a-field of this array. Since S' C S, (iii) 
follows. 

(iii) <J=^ (ii): S is invariant under finite permutations, so the exchange- 
ability implies that every Jjj is 5-measurable if J 12 is. The converse is 
trivial. 

(iii) =^> (i): It follows from (110. 5p and (|10.6p that £12 is independent of 
S. If (iii) holds, then J12 is thus independent of £12, which by (|10.6p implies 
that J12 = E(Ji2 I X 1 ,X 2 ) = W(X 1 ,X 2 ) a.s., so W is {0, l}-valued a.e. □ 



Appendix A. Special probability spaces 

A.l. Atoms. An atom in a probability space (£l,/x) is a subset A with 
fj.(A) > such that every subset B C A satisfies ^{B) = or fJ-(B) = [i{A). 
We say that Q is atomless if there are no atoms. 

Lemma A.l. If {Vt,jj) is an atomless probability space, then there exists a 
family (A*)rg[o,i] of measurable sets such that fi(A r ) = r for every r G [0, 1], 
and further A r C A s if r < s (i.e., the family is increasing). 

Proof. Consider families (A r ) r£ E with these properties, defined on some 
arbitrary subset E of [0, 1]. By Zorn's lemma, there exists a maximal family; 
we claim that then E = [0, 1]. In fact, 0, 1 G E, since we otherwise could 
enlarge the family by defining Aq = or A\ = Q. Further, E is closed, since 
otherwise there would exists r ^ E and a sequence r n £ E such that either 
r,„ /* r or r n \ r; in the first case we can define A r := (J n A rn , and in the 
second case A r := C\ n A rn . Finally, if E 7^ [0, 1], the complement [0, 1] \ E 
thus is open, and thus a disjoint union of open intervals. Let (a, b) be one 
of these intervals. Then a, 6 G E, and A^ \ A a is a set of measure b — a > 0. 
Since /i is atomless, there exists a subset C ^ Ab\A a with < /i(C) < b — a, 
but in this case, the family could be extended by A a+ ^ c ^ := A U C, so we 
again contradict the maximality of the family. Hence E = [0,1], which 
completes the proof. □ 

We also give a reformulation in terms of a map to [0, 1]. 
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Lemma A. 2. If (0, /i) is an atomless probability space, then there exists a 
measure-preserving map <p : Q — > [0, 1]. 

Proof. Let (A r ) r be as in Lemma lA.H and define (p(x) := inf{r 6 [0, 1] : x € 
A r } (assuming as we may that Ai = Q). □ 

Lemma A. 3. If (p : f^i — >• is measure-preserving and VL2 is atomless, 
then is atomless too. 

Proof. Let (^4r)re[o,i] De a f amu y of subsets of 0,2 with the properties in 
Lemma IA.ll Then B r := (^ _1 (^4 r ) defines a family of subsets of fii with 
the same properties. Suppose that A C f2i is an atom. Then, for each r, 
fi(A n B r ) = or n(A). Let ro := sup{r : n(A n B r ) = 0}, and take any 
r_ < ro and r + > ro- (If ro = 0, take r_ = 0, and if ro = 1, take r + = 1.) 
Then fi(An B r _) = and ^(Ar\B r+ ) = ^(A), so 

H(A) = fi(ADB r+ ) - fj,(Af]B r _) = [i(An(B r+ \ B r _)) 

< fi(B r+ \ B r _) = r + - r_. 

This is a contradiction, since [i{A) > while r + — r_ can be arbitrarily 
small. □ 

In the opposite direction, there are typically many measure-preserving 
maps from an atomless space Q\ into a space with atoms. Simple examples 
are the trivial map onto a one-point space, and the indicator function of 
a subset B C Q t seen as a map (f2i,/x) — > ({0,1}, z^), which is measure- 
preserving if ^{1} = fJ-(B). 

A. 2. Borel spaces. To define Borel spaces, it is simplest to begin with 
measurable spaces, without any particular measures. 

We say that two measurable spaces (£l,J-) and (Q',J-') are isomorphic 
if there is a bimeasurable bijection (p : Vt — > f2', i.e., a bijection such that 
both ip and ip~ x are measurable. (Similarly, two probability spaces (O, T, fj.) 
and (tt' , J- 1 , /j,') are isomorphic if there exists a bimeasurable bijection that 
further is measure-preserving.) 

A measurable space is Borel (also called standard [l^ | or Lusin [13]) if 
it is isomorphic to a Borel subset of a Polish space (i.e., a complete metric 
space) with its Borel cr-field. A probability space (O, J 7 , /x) is Borel if (£1, J-) 
is a Borel measurable space; equivalently, if it is isomorphic to a Borel subset 
of a Polish space equipped with a Borel measure. 

In fact, we do not need arbitrary Polish spaces here; the following theorem 
shows that it suffices to consider subsets of [0,1]. We tacitly assume that 
[0, 1] and other Polish spaces are equipped with their Borel cx-fields. 

Theorem A. 4. The following are equivalent for a measurable space (f2, J 7 ), 
and thus each property characterizes Borel measurable spaces. 

(i) (f2, J 7 ) is isomorphic to a Borel subset of a Polish space. 

(ii) (O, J-) is isomorphic to a Polish space. 

(iii) (Q., F) is isomorphic to a Borel subset of [0, 1]. 
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(iv) (f2,.F) is either countable (with all subsets measurable), or isomor- 
phic to [0, 1]. 

For a proof, see e.g. [H, Theorem 8.3.6] or 0, Theorem 1.2.12]. An 
essentially equivalent statement is that any two Borel measurable spaces 
with the same cardinality are isomorphic. 

Hence, a Borel probability space is either countable or isomorphic to [0, 1] 
equipped with some Borel probability measure. Consequently we can, when 
dealing with Borel spaces, restrict ourselves to [0, 1] without much loss of 
generality (the countable case is typically simple), but for applications it is 
convenient to allow general Borel spaces. 

Remark A. 5. Another simple Borel space is the Cantor cube C := {0, 1}°° 
(which up to homeomorphism is the same as the usual Cantor set); this is 
a compact metric space, and thus a Polish space. Since C is uncountable, it 
is by Theorem IA.4I isomorphic to [0,1] as measurable spaces; consequently 
we may replace [0, 1] by C in Theorem IA.41 

One important property of Borel spaces is the following theorem by Ku- 
ratowski, showing that a measurable bijection is bimeasurable, and thus an 
isomorphism. 

Theorem A. 6. Let f2 and Q' be Borel measurable spaces. If f : f2 — > Q! is 
a bijection that is measurable, then / _1 : VL 1 ^ VL is measurable, and thus f 
is an isomorphism. 

More generally, if f : O — > CI' is a measurable injection, then the image 
/(O) is a measurable subset of £1' and f is an isomorphism of Q onto /(fl). 

For a proof, see e.g. 0, Proposition 8.3.5 and Theorem 8.3.7]; see also 



further results in 19), Sections 8.3 and 8.6]. 



Let us now add measures to the spaces. There is a version of Theorem lA.4l 
for probability spaces. For simplicity we begin with the atomless case. Recall 
that A denotes the Lebegue measure. 

Theorem A. 7. If(Q,fj,) is an atomless Borel probability space, then there 
exists a measure-preserving bijection of (^,/u) onto ([0,1], A). 

In other words, all atomless Borel probability space are isomorphic, as 
measure spaces. 

Proof. Since Q is atomless, every point has measure and thus every count- 
able subset has measure 0; in particular, Q cannot be countable. By Theo- 
rem EHiv), there exists a bimeasurable bijection (p\ of VL onto [0, 1]. This 
maps the measure [i onto some Borel measure v on [0, 1]. 

Since v has no atoms, x i— > v([0,x]) is a continuous non-decreasing map 
of [0, 1] onto itself. We let ijj : [0, 1] — > [0, 1] be its right-continuous inverse 
defined by 

ip(t) := sup{x G [0,1] : v([0,x]) < t}. (A.l) 
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Then f([0,ijj(t)]) = t for every t G [0,1], which implies that ip is strictly 
increasing. Hence, ip is injective and measurable, and by Theorem IA.6|, ip is 
a bimeasurable bijection of [0, 1] onto some Borel subset B := ?/>([0, 1]). 

It follows from fO) that, for all s,t G [0,1], ip(t) > s <^ v([0,s]) < t, 
and thus ^ -1 ([0,s)) = [0,v([0,s])). Hence, 

A^QO,*))) = u([0,s]) =u([0,s)), se [0,1], 

which implies that A^ = v (see Remark 15.41 for the notation), i.e., that 
tp : ([0, 1], A) — > ([0, 1], u) is measure-preserving. 

Consequently, ift is a measure-preserving bijection ^ : ([0, 1], A) — > (B, v). 
Choose an uncountable null set A^ C [0, 1] (for example the Cantor set). 
Then N' := ip{N) is an uncountable null set in {B,v). The restriction of 
i[> to [0, 1] \ N is a measure-preserving bijection onto B \ N' . Further, JV 
and N' U B c , where B c := [0, 1] \ B, are both uncountable Borel subsets of 
[0, 1], and thus by Theorem \AA\ both are isomorphic as measurable spaces 
to [0, 1], and thus to each other. Hence there exists a measurable bijection 
V>i : N N' U B c . 

Define iftz : [0, 1] — > [0, 1] by ip2(%) = tp(x) when x ^ N and ip2( x ) = 
il>i(x) when x £ N. Then ip is a measure-preserving bijection ([0, 1], A) — >• 
([0, 1], v). Consequently, ° <P is a measure-preserving bijection of (fi, //) 
onto ([0,1], A). ' □ 

It is easy to handle atoms too. An atom in a Borel probability space is, 
up to a null set, just a single point with a point mass; hence, a Borel space 
is atomless if and only if it has no point masses, i.e. no point with positive 
measure. In any Borel probability space there is at most a countable number 
of point masses, and removing them we obtain an atomless Borel measure 
space. This leads to the following characterization. 

Theorem A. 8. A probability space is Borel if and only if it is isomorphic, 
by a measure-preserving bijection, to one of the following spaces. 

(i) A countable set D = {xi}f =1 (where n < oo), with all subsets mea- 
surable and the discrete measure given by fJ-(A) = Y^i- x &APi> S or 
some pi > 0. (Necessarily YliPi = W 



(ii) The disjoint union T> U N, where T> is as in (i) and N is a null 
set given by any given uncountable Borel measurable space equipped 
with zero measure. (We may choose for example N = [0, 1] with zero 
measure, or the Cantor set with A, which vanishes there.) 

(hi) The disjoint union of a closed interval ([0, r], A) with < r < 1 and a 



countable setV as in (i) (possibly empty); in this case r + Y2iPi = 1; 



and we may further assume that each pi > 0. 

Proof. If (£1, /i) is a Borel probability space, let D := {x G £1 : fi{x} > 0} 
and £1' := D c = O \ D. Then D is countable, and is atomless. 

Let r := ^(fi'). If r > 0, then by Theorem IA.4I and a scaling, is 
isomorphic to [0, r], which yields (hi) If r = 0, then £}' is a null set. If 
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further Q' is uncountable, then (Q',fi) = is isomorphic to (A^, 0) for 



any uncountable Borel space by Theorem IA.4( iv) . which yields (ii) Finally. 



if 0' is countable, then £1 is countable and (i) holds. 
The converse is obvious. 



□ 



Theorem A. 9. If £1 is a Borel probability space, then there is a measure- 
preserving map [0, 1] —> Q. 



Proof. It suffices to show this for the spaces in Theorem IA.^(i) - (iii) , and for 



these it is easy to construct explicit maps. (For each x £ T>, map a suitable 

map [0, r 



interval of length fi{x} to x; in (iii 
map.) 



onto itself by the identity 

□ 



A. 3. Lebesgue spaces. A Lebesgue probability space is a probability space 
that is the completion of a Borel probability space; equivalently (see Theo- 
rem E3|), it is isomorphic to a Polish space (or, equivalently, a Borel subset 
of a Polish space) equipped with the completion of a Borel measure. 



Theorem IA.8I leads directly to the following characterization. 

Theorem A. 10. A probability space is Lebesgue if and only if it is iso- 
morphic, by a measure-preserving bijection, to one of the spaces given in 
Theorem \A.8l with the modifications that in 



(ii) all subsets of N are mea- 



surable (with measure 0), and in (iii) the interval [0, r] is equipped with the 
Lebesgue a-field C. □ 

In other words, every Lebesgue probability space is, possibly ignoring 
a null sets, isomorphic to either a countable discrete space, an interval 
([0, r],C, A), or a disjoint union of an interval and a countable discrete part. 

Corollary A. 11. An atomless Lebesgue space is isomorphic to ([0, 1] , jC, A) . 

Proof. Immediate from either Theorem I A. 101 or Theorem IA.41 □ 

Remark A. 12. Lebesgue spaces were introduced by Rohlin [56] by a dif- 
ferent, intrinsic, definition, see also Haezendonck 34|. The equivalence to 
the definition above follows from [56j, §2.4] or 34], Remark 2, p. 250]. 



Appendix B. Graph limits 

As said in the introduction, graph limits were introduced by Lovasz and 
Szegedy jig}] and further developed by Borgs, Chayes, Lovasz, Sos and 
Vesztergombi [HI, [IH . The central idea in graph limit theory is to assign 
limits to (some) sequences G n of (unlabelled) graphs with \G n \ — > oo. Part 
of the importance of this notion is the fact that several different definitions 
of convergence turn out to be equivalent. One definition is the following, 
which has the advantage that it easily is adapted to many other situations 
such as hypergraphs, bipartite graphs, direct ed g raphs, compactly decorated 
graphs and posets, see [!, [H, O, 0, 0, M, M,\M, M, E3] • 

For each k < \G n \, let G n [k] be the random induced subgraph of G n 
with k vertices obtained by selecting k (distinct) vertices v±,...,Vk £ G n at 
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random (uniformly); we regard Cr n [/s] as a labelled graph with the vertices 
labelled 1, . . . , k; equivalently, we regard G n [k] as a graph with vertex set 
{l,...,fc}. 

Definition B.l. A sequence of graphs (G n ) with \G n \ — > oo converges if 
for each fixed k, the distribution of the random graph G n [/c] converges as 
n — > oo. 

In other words, for each k and each labelled graph G with |G| = k, we 
require that lim n ,_ > . 0O P(G n [/c] = G) exists. 

Given this notion of convergence, graph limits can be defined abstractly, 
as equivalence classes of convergent sequences of graphs. Equivalently, one 
can easily introduce a metric on the set of unlabelled finite graphs such that 
the convergent sequences become the Cauchy sequences in the metric, and 
then construct the completion of this metric space. 

It turns out that the space of limits can be identified with the quotient 



space W := Ur2^(^)/ — defined in Section (H see Lovasz and Szegedy [481 ] 
and Borgs, Chayes, Lovasz, Sos and Vesztergombi In other words, every 
graph limit is represented by a graphon, but non-uniquely, since every equiv- 
alent graphon represents the same graph limit. (Conversely, non-equivalent 
graphons represent different graph limits.) 

Moreover, convergence to graph limits can be described by the cut metric. 
If (G n ) is a sequence of graphs with \G n \ — > oo, and W is a graphon, then G n 
converges to the graph limit represented by W if and only if dn(Wc n , W) — > 
0, where Wg„ is as in Example 12.71 In this case we also say that (G n ) 
converges to W, and write G n — > W (remembering the non-uniqueness of 
W). 

Remark B.2. In particular, for every graphon W, there exist sequences 
of graphs (G n ) such that G n — > W. (One construction of such G n is the 
random construction in Appendix ID1 below.) 

Convergence to graph limits can also be described by the homomorphism 
densities defined in Appendix[Cj G n — > W if and only if t(F, G n ) — > t(F, W) 
for every simple graph F. 

For details and many other results, see Borgs, Chayes, Lovasz, Sos and 
Vesztergombi for further aspects, see e.g. Austin [3], Bollobas and Ri- 
ordan fl2l|. Borgs, Chayes, Lovasz, Sos and Vesztergombi [HI], Diaconis and 
Janson [24 ]. Lovasz and Szegedy [5j|, and Appendix ID1 below. 

Appendix C. Homomorphism densities 

Define, following [l^] and [48[, for a graphon (or, more generally, any 
bounded symmetric function) W : Q 2 — > [0, 1] and a simple graph F vith 
vertex set V(F) and edge set E(F), the the homomorphism density 

t(F,W):= [ TT W(xi,x^)d/i(xi)--- dM(x. F i). (C.l) 

nV(F) - LJ - 1 
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If Xi are i.i.d. random variables with values in Q, and distribution [i, we can 
write (|C.1|) as 

t(F,W):=E J] W{Xi,X 3 ). (C.2) 

ijeE(F) 

The homomorphism densities can be defined for graphs too by t(F, G) := 
t(F, Wg)- It is easily seen that t(F,G) is the proportion of maps V(F) — > 
V(G) that are graph homomorphisms (or, equivalently, the probability that 
a random map V(F) — > V(G) is a graph homomorphism. (This explains 
the name homomorphism density.) 

The homomorhism denisities have a central place in the graph limit the- 
ory. In particular, as shown in [14], G n — > W if and only if t(F,G n ) — > 
t(F,W) for every simple graph F. 

The definition (IC.lj) makes sense also for loopless multigraphs F, where 
we allow repeated edges. (Loops are not allowed, since we want t(F, W) = 
t(F, W) when W = W a.e., and this rules out a factor W(xi, Xj) in (|C.l|h ) 

Example C.l. Let be the multigraph with 2 vertices connected by k 
parallel edges. Then t(M k ,F) = j n2 W k . 

We have seen in Theorem EE] that t(F,W) = t(F,W') when W = W, 
for every multigraph F. In other words, the mapping W h-> t(F, W) yields 
a well-defined mapping on the quotient space W := W*/ =, which is the 
same as the space of graph limits, see Appendix iBl 

Lemma C.2. The mapping W \— > t(F,W) is continuous on (W,<5n) if and 
only if F is a simple graph. 

In other words, if 5 n (W n ,W) -)■ 0, then t(F,W n ) -> t(F,W) for every 
simple graph F. However, if F is a multigraph with parallel edges, then 
S n (W',W) = implies t(F,W) = t(F,W), but S n (W n ,W) -»• does not 
imply t(F,W n ) -+t(F,W). 

Proof. It is easy to see that W i-> t(F, is continuous in 5n for every 
simple F, see [3] or (4^ |: more precisely, for any graphons W and W, 

\t(F, W) - t(F, W')\ < \E{F)\ S D {W, W). (C.3) 

For the converse, suppose that the loopless multigraph F is not simple, 
and let F' be the simple graph obtained by identifying parallel edges in F. 
Thus V(F') = V(F), but \E(F')\ < \E(F)\. 

Let W be the constant graphon 1/2 defined on [0,1], and let G n be a 
sequence of graphs such that G „ —> W. (See Remark IB. 21 Such sequences 
are known as quasirandom, see [43]. For example, G n can be a realization 
of the random graph G(n, 1/2), see Appendix ID1) 

Let Wg„ be the graphon corresponding to G n as in Example 12. 7 j we thus 
have 5n(Wc n , W) — > 0. On the other hand, Wc n is {0, l}-valued, and thus 
t(F, WG n ) = t{F' ,WG n ) by (jC.ip . Hence, using the already proved part of 
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the lemma for F', 

t(F, W Gn ) = t{F', W Gn ) -> t(F', W) = 2-1^^)1 > 2-\ E ^\ = t(F, W). □ 

Example C.3. In particular, W i— )■ t{K 2 , W) = J^ 2 W is continuous in the 
cut metric, but W H> t(M 2 ,W) = J Q2 W 2 , see Example IC.ll is not. More 
generally, W h-> Jq2 W fe is not continuous for any k > 1. 

If we use the stronger metric <5x, we have continuity for multigraphs too. 
(This metric is, however, much less useful.) 

Lemma C.4. The mapping W i— > t(F, W) is continuous on (W, <5i) /or 
every loopless multigraph. 

We omit the easy proof, similar to the proof for Sn and simple graphs in 



14| or [48(]. 



Appendix D. Graphons and random graphs 

Let W be a graphon, defined on some probability space Q. For 1 < n < oo, 
let [n] = {i G N : i < n}; thus [n] = {1, . .. ,n} if n is finite and [oo] = N. 
We define a random graph G(n, W) with vertex set [n] by first taking an 
i.i.d. sequence {Xj}™ =1 of random points in Q with the distribution //, and 
then, given this sequence, letting ij be an edge in G(n, W) with probability 
W(Xi, Xj); for a given sequence this is done independently for all 

pairs (i, j) G [n] 2 with i < j. (I.e., we first sample Xi,X 2 , ... at random, 
and then toss a biased coin for each possible edge.) 

The random graphs G(n, W) thus generalize the standard random graphs 
G(n,p) obtained by taking W = p constant. Note that we may construct 
G(n,W) for all n by first constructing G(oo, W) and then taking the sub- 
graph induced by the first n vertices. 

This construction was introduced in graph limit theory in [48|] and [141 ] . 
(For other uses, see e.g. @ and IH.) 



Remark D.l. If F is a labelled graph, then the homomorphism density 
t(F, W) in (jC.ip equals the probability that F is a labelled subgraph of 
G(oo, W) (or of G(n, W) for any n > \F\). 

In particular, this shows that the family (t(F,W))„ and the distribution 
of G(oo, W) determine each other; see further Theorem 18.101 and [24| . 



Remark D.2. If W is a random-free graphon, i.e., W(x,y) £ {0,1} a.e., 
then the construction of G(n, W) simplifies. We sample i.i.d. X±,X2, ... as 
before, and draw an edge ij if and only if W(Xi, Xj) = 1; thus the sec- 
ond random step in the construction disappears. (This explains the name 
"random-free" ; of course, G(n, W) still is random, but it is now a determin- 
istic function of the random JQ.) 

The infinite random graph G(oo,VF) is an exchangeable random graph, 
i.e., its distribution is invariant under permutations of the vertices, and 
every exchangeable random graph is a mixture of such graphs, i.e., it can 
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be obtained by this construction with a random W. This is an instance 
of the representation theorem for exchangeable arrays by Aldous [l| and 
Hoover [35| . see also Kallenberg (43[. Moreover, by Theorem 18.101 if W is 
another graphon, then G(oo,W) and G(oo,W) have the same distribution 
if and only if W = W. Consequently, the mapping W h-> G(oo,W) gives 
a bijection between the set W = W*/ = of equivalence classes of graphons 
and a subset X of the set X of distributions of exchangeable infinite random 
graphs; this subset X is easily charaterized in several different ways, for 
example as follows. 

Lemma D.3. For an exchangeable infinite random graph G, the following 
are equivalent, and thus all characterize C{G) G X . 

(i) G = G(oo, W) for some graphon W . 

(ii) The distribution C(G) is an extreme point in X . 

(hi) G is ergodic: every property that is (a.s.) invariant under left-shift 
(i.e., delete vertex 1 and its edges and relabel the remaining vertices 
i i — y % — \) has probability or 1. 

(iv) Every property of G that is (a.s.) invariant under finite permutations 
of the vertices has probability or 1. 

(v) For any two disjoint subsets of vertices V\ and V2, the induced sub- 
graphs G\v x and G\v 2 are independent. 

Proof. See and 0. □ 

D.l. Graph limits and random graphs. There is also a simple connec- 
tion between graph limits and exchangeable infinite random graphs. By 
Definition E3J if (G n) is a convergent sequence of graphs with \G n \ — > 00, 
then for each k there exists a random graph G[k] on the vertex set [A;] such 

that G n [k] —^4 G[k]. The distributions of G[k] for different k are consistent, 
so by Kolmogorov's extension theorem, there exists a random infinite graph 

G on [00] such that G[k] = G|ny, i.e., G n [fc] — > GLu. Each G n [/c] has an 
exchangeable distribution, and thus so has each G[k]; consequently, G is an 
exchangeable infinite random graph; furthermore, it is easily seen that G 
satisfies Lemma ID. S|(v)[ and thus its distribution belongs to X. Thus every 
graph limit can be represented by an exchangeable infinite random graph 
with distribution in X. Conversely, if G is any exchangeable infinite random 
graph with a distribution in X, then the induced subgraphs G n := G|r n i a.s. 

satisfy G n [fc] — > Gjryfor every k, as can be seen from the limit theorem for 
reverse martingales [13] or directly thus the sequence (G n ) converges 
a.s., and its limit is represented by the infinite random graph G. 

This yields a bijection between the set of graph limits and the set X, 
characterized in Lemma ID. 31 of distributions of exchangeable infinite ran- 
dom graphs. 
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This connection between graph limits and (distributions of) exchange- 
able infinite random graphs combines with the connection above between 
(equivalence classes of) graphons and (distributions of) exchangeable infi- 
nite random graphs to prove the central fact stated in Appendix[B]that there 
is a bijection between graph limits and equivalence classes of graphons; see 
further [j, 0], 0, Q. 

In particular, for any graphon W, we have a.s. G(n, W) — > W as n — > oo, 
in the sense of Appendix |B] j-dlj ] , cf. Remark IB.2I 

Remark D.4. This method of proving the connection between graph limits 
and graphons through the use of exchangeable infinite random graphs as an 
intermediary generalizes immediately to several extensions of the theory, and 
it may be used to find the correct analogue of graphons in new situations. See 
for example [J] (hypergraphs) and [24j] (bipartite graphs, directed graphs). 

Another example is compact decorated graphs [52j , which are graphs with 
edges labelled by elements of a fixed second-countable compact space (i.e., 
a compact metrizable space 0, Theorem 4.2.8]) /C; this includes several 
interesting cases. /C-decorated graph limits are defined as in Definition IB.ll 
now with /C-decorated graphs. The arguments sketched above show that 
there is a bijection between /C-decorated graph limits and distributions of 
exchangeable /C-decorated infinite random graphs satisfying the properties 
in Lemma lD.31 and a further bijections to equivalence classes of graphons, 
where the graphons now take their values in the space V(1C) of Borel prob- 
ability measures on /C. (The representation theorem in [42J yields a rep- 
resentation where the label of ij is f(Xi,Xj,£ij) for some fixed function 
/ : [0, l] 3 — > X with Xi and ^ uniform on [0, 1] and independent of each 
other; it is easily seen that this leads to an equivalent representatio n by 
■p(/C)-valued graphons W : [0, l] 2 — > V()C).) For a different proof, see [521 ] . 
Many results in Sections [6HS] above extend to this case, but we leave that to 
the reader. 

In fact, the arguments above on the equivalences work for any Polish space 
/C, also non-compact; however, compactness implies that the resulting space 
of decorated graph limits is compact, which is important for some results. 

D.2. Entropy. If we regard G(n, W) as a labelled random graph, we may 
identify it with the collection (Jij)i<j of the (o) edge indicators J%j := 
l{ij is an edge}, 1 < i < j G [n]. For finite n, G(n, W) is thus a dis- 
crete random variable with 2(2) possible outcomes. Recall that for any dis- 
crete random variable Z, with outcomes (in any space) having probabilities 
pi,P2; * * * , say, its entropy £{Z) is defined by 



We also write S(Z\, . . . , Z n ) for the entropy of a vector (Z\, . . . , Z n ), and 
£{Z I Z') for the entropy of the conditioned random variable {Z \ Z'). 
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The following asymptotic calculation of the entropy of G(n, W) is a special 
case of the symmetric version of the formula in [2j, Remarks, p. 146]. Let 

Hp) ■= -plogp- (1 -p)log(l -p), p G [0, 1]; 

thus the entropy of a {0, l}-valued random variable Z £ Be(p) is h(p). Note 
that h is continuous on [0, 1] with < h(p) < log 2 and h(0) = h(l) = 0. 

Theorem D.5. Let W be a graphon, defined on a probability space 
Then, as n — >■ oo, 



£(G(n,W)) 



h(W(x,y))dti(x)dn(y). (D.l) 
n 2 



Proof. If we condition on X±, . . . ,X n , then Jj,- are independent and each 
Jij E Be(pjj) with pij = W(Xi,Xj). Thus, using in the calculations here 
and below some simple standard results on entropy, 



£(G(n,W)\X 1 ,...,X n ) =Y J £{Jij\X 1 ,...,X n ) = £f(Be(p -)) 

i<j i<j 



Hence, 



£(G(n,W)) >E£(G(n,W) \ X u ...,X n ) =Mj2h(W(X i ,X j )) 

h(W(x,y)) dn(x)dfj,(y). 



i<3 

) Jjn 2 

Thus the left-hand side of (|D.ip is greater than or equal to the right-hand 
side for every n > 2. 

To obtain a corresponding upper bound, we for convenience assume that 
ft = (0, 1], as we may by Theorem 17. II (noting that J J h(W) is preserved by 
pull-backs, and thus by equivalence, see Theorem 18. 3|> . 

Fix an integer m and let Mj := [~mXj~|. Thus Mj = k Xi G -/fc m . 

We have 

£(G(n, W)) < £(G(n, W), M u . . . , M n ) 

= £(M 1 ,...,M n )+E(£(G(n,W) \ M u ...,M n )). (D.2) 

Since Mi, . . . , M n are independent and uniformly distributed on {1, ... , m}, 

n 

£(Mi,...,M n ) = ^£(M) = nlogm. (D.3) 
i=l 

Moreover, 

£{G(n, W) | Mi, ... , M n ) < ^ £( J l3 \ M u . . . , M n ) = £ £(Jy I M i) ■ 

i<j i<j 

(D.4) 



w, 
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Define, for k,l = 1, . . . , m, 

.2 



t (M) :=E(W(X 1 ,X 2 ) | Mi = fe, M 2 = ^) 



W(x, y) dx dy, 



the average of W over x Jj m , and let 

W m (x, y) := w m {k, I) if x G J fem , y € Jj m . 

Thus W m (Xi,Xj) equals the conditional expectation E(W(Xi, X 2 ) | Mi, M 2 ) 
Given M { = k and Mj = I, 

P(Jy = 1) = E(W(Xi,X 2 ) | Mi =k,M 2 = l)= w m (k,l), 

and thus 

| Mi = fc, Mj = Z) = h(w m (k,l)). 

Consequently, 



E(£(j tj | M h Mj)) = m~ 2 V h(w m (k,l)) = h(W m (x,y)) dxdy. 

k,l=l JJ ^ 2 

(D.5) 

Combining f|DT2ft - (|DT5]) . we obtain 
£(G(n,W)) < nlogm + 



[0,1] 



/i(W m (x,y)) dxdy. 



and thus, for every m > 1, 

-l 



lim sup 



S(G(n,W))< h(W m (x,y))dxdy. 



[0,1]' 



Now let m — >■ oo. Then W m (x,y) — >■ W(x,y) a.e., and thus the right-hand 
side tends to J J h(W) by dominated convergence. □ 

Appendix E. Other versions of the cut norm 

There are several other versions of the cut norm that are equivalent to 
the versions in f)4.2[) and (|4.3|) within constant factors or, in Subsection IE.3|, 
at least in a weaker sense. 

E.l. Restrictions on the pairs of subsets. First, we may restrict the 
subsets S and T of VL in (14. 2|) in various ways. Borgs, Chayes, Lovasz, Sos 
and Vesztergombi JL4J, Section 7] give three versions where it is assumed that, 
respectively, S = T, S and T are disjoint, and S and T are the complements 



of each other, i.e., 



\W\\n,3 
\W 



\W\ 



DA • = 



□ ,5 



sup 

5 



sup 

SnT=( 



sup 

S 



W(x,y)dn(x) dfx(y) 



SxS 



W(x,y)dfj,(x)dfj,(y) 



SxT 

W(x,y)dn(x) dfj,(y) 



(E.l) 
(E.2) 
(E.3) 
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These have natural combinatorial interpretations for graphs as follows. 
For a graph G with vertex set V and edge set E, we define, for A,BCV, 

e(A,B) = e G (A,B) := \{(x,y) e A x B : {x, y} € E}\ ; (E.4) 

we also write e G (A) := e G (A, A). (Thus, if ^4 and B are disjoint, then 
e(^4, B) is the number of edges between A and -B. On the other hand, e(A) 
is twice the number of edges in A) 

Lemma E.l. Let G\ and G2 be two graphs on the same vertex set V , and 
let n := \V\. Then, for both versions Wq and W G , 



\\W Gl 
\\W Gl 

\\w Gl 



W<3 2 \\a, 3 
^G 2 ||n,5 



n 



n 



n 



max\e Gl (A) - e G2 (A)\, 



ACV 



1 max \e Gl (A,B) -e G2 (A,B)\, 
'■max\e Gl (A,A c )-e G2 (A,A c )\. 



(E.5) 
(E.6) 
(E.7) 



In particular, \\W Gl — W G2 \\\j,5 measures directly the maximal difference 
in size of cuts in G\ and G2 , which explains the name "cut norm" . 

Proof. For Wq this is immediate, since for every S, T C f2 = V we have 



n 



2 e Ge (S,T). 



For W G , let (o^- be the adjacency matrix of Gi, so 



,W - 



£(G*)}. If 5,TC [0,1], let Si := X(SnI in ), tj := X(TnI jn ). Then 



n 

/ (W- Gl -W G2 )= ^^(a. 
JsxT ij=i 



(i) 



(2) 



It follows that 



\W Gl - W G2 \\ D ,4 



sup 

0<Si<l/n 



sup 

0<s;<l/n 



C (!) 



,( 2 ) 



*J=1 



a (1) 



(E.8) 

(E.9) 
(E.10) 



|w- Gl 



c- (i) 
bmce ah 



W G2 \\d,5 
,(2) 



sup 

0<s;<l/n 



»J=1 



S i)(% 



(1) 



,( 2 h 



(E.11) 



a jV = ^' ^ ne diagonal terms in these sums vanish, and thus the 
sums are affine functions of each Sj and (for || • [|rj 4) Uj. Hence, the suprema 
are attained when all Sj are either or 1/n, and Uj or 1, i.e., when S and 
T are unions S 1 = UieA an< ^ = UfeB ^ or some ^> B ^ ^> but then 



n 



*e Ge (A, B), so we obtain the same result as for W G . 



□ 



Lemma E.2 ([141]). If ft is atomless and W S L 1 (fl 2 ) is symmetric, then 
the norms \\W\\n,i, i = 1 , - - - , 5, are equivalent. More precisely, 

\\W\\n,i < \\W\\ a ,2 < 4||W|| n ,i for all fi and W\ (E.12) 
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2l|W|b,i — [|W[|n,3 < ||W||n,i if W is symmetric; (E.13) 

^H^llna < ||W"||n,4 < ||W||n,i if O is atomless; (E.14) 

2 

-||W||n,4 < || W||n,5 < ||W|b,4 if W is symmetric. (E.15) 

Proof. The inequalities (|E.12|) were given in (|4.4p . For the others, the right- 
hand sides are trivial. 

For the left-hand sides, let W (S, T) := f SxT W. Then ([031) follows from 
W(S,T) = W{T,S) and 

W(S, T) + W{T, S) = W(S UT,SUT) + W{S C\T,SC\T) 

-W(S\T,S\T) -W(T\S,T\S). 

For (|E.14|) we randomize. Let (Ai)^ =1 be a partition of with n(Aj) = 1/n 
for each i (such partitions exist when f2 is atomless as a consequence of 
Lemma |A.1|) . let I be a random subset of {1, ... ,n} defined by including 
each element with probability 1/2, independently of each other, and define 
a random subset B of Q by B := [j ieI A%. Then, for any C (], E| W(<Sn 
B,T\B)\< \\W\\a,4- Moreover, 

E w(S nB,T\B) = J2 \ w ( s n4^n Aj) 

= ±W(S, T) - ± £ W ( S n ^ TnA ^- 

i 

The last sum is the integral of W over a subset of Q 2 of measure 1/n, so 
it tends to as n — > oo. Consequently, \W{S,T) < \\W\\n,i, and <|ETHh 
follows. 

For JH5]), assume that S(lT = 0. Let R := {SUT) C . The result follows 
from 

W(S, T) + W(T, S) = W(S, TUR) + W(T, S U R) - W(S U T, R). □ 

Remark E.3. Some restrictions are necessary in Lemma IE. 21 For ex- 
ample, if W is anti-symmetric (W(x,y) = —W(y,x)), then HWHrj^ = 0, 
so (|E.13|) does not hold for arbitrary W. More generally, if W(x, y) := 
n(W(x,y) + W(y,x)) is the symmetrization of W, then || • ||n,3 never dis- 
tinguishes between W and W, so || • ||n,3 is appropriate only for symmetric 
W. 

Similarly, if has an atom A and W(x, y) := y G ^4}, then || W||n,4 = 
and (|E.14|) does not hold. Hence, in general || • ||n,4 and || • H^g are not 
appropriate for spaces with atoms. (However, they work well also for Wq 
for graphs G, because Wq(x,x) = for every x, see Lemma [E.ll and its 
proof.) 



62 



SVANTE JANSON 



If W is anti-symmetric and the marginal W(x, y) d/x(y) = 0, then 



W= W- W = (E.16) 
Sxs c Jsxn Jsxs 

for every 5, so ||W||n,5 = and (|E.15[) does not hold (unless W = a.e.). 
For example, we can take W(x,y) = sin(2-7r(x — y)) on [0,1], or take f2 = 
{1,2,3} with fi(i) = 1/3 for each i £ n, and W(i,j) £ {-1,0,1} with 
W(i,j) = i — j (mod 3). (In fact, if ft is atomless, then HWHn^ = if 
and only if W is anti-symmetric and its marginals vanish a.e. To see this, 
note that if ||W(x,y)||n i 5 = 0, then ||W(y,x)||n t 5 = as well, and thus 
||^||n,5 = 0. By Lemma MM then \\W\\n,2 = and thus W = a.e. By 
pJBjl . f g W {2) = f Sxn W = for every SCfi, and thus W {2) = a.e.) Cf. 
[39|, Section 9]. 

Remark E.4. If W = W, then ||W||n, 3 = H^'Hn^; this is easily seen 
first for pull-backs by the argument in the proof of Lemma 15.51 and then in 
general by Theorem 18.31 The same holds for || • ||n,4 and || • ||n,5 provided W 
and W are defined on atomless spaces, using also a randomization argument 
similar to the one in the proof of Lemma IE. 21 However, this is not true in 
general for spaces with atoms. For a trivial example, let W = 1 on [0, 1] and 
W' = 1 on one-point space; then ||W||n,4 = ||W'||n,5 = 0. 

Remark E.5. The constants in (|E.12}) - (|E.15[) are best possible. Examples 
with equality in the left or right inequalities are given by the following 
matrices, interpreted as functions on [0, l] 2 , with each row or column in an 
n x n- matrix corresponding to an interval ij n of length 1/n (we could use a 
space f2 with n points, but we want Q to be atomless): 



(|ET2]1 

(EH 

(IRT!) 
(1RT5|) 



-10 n 
01 , 1 



1 -1 
(- 

3-1 



/ 3 -!\ 

(-?-s-s)- 



E.2. Complex and Hilbert space valued functions. Another set of 
versions of the cut norm use (|4.3[) but consider other sets of functions / 
and g. For example, we may take the supremum over all complex-valued 
functions / and g with |/|, \g\ < 1, i.e. 

\\W\\ n< c-= sup / W{x,y)f(x)g(y)dn(x)dfi{y) . (E.17) 



l|oo,||s||oo<l 



It is easily seen that || VP^ ||n,c < 2||W||n ) 2, which can be improved to [4f 

\\W\\ n ,2 < \\W\\ a ,c < v^||W|| n , 2 , (E.18) 

which is best possible. (For an example, consider a two-point space £1 = 
{1,2} with /i{l} = ^{2} = 1/2, and let Wi(x,y) = 1/2 and W 2 (x,y) = 
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l{x = y = 1}. Then \\W X - W 2 \\n, 2 = 1 /4 but \\Wi - W 2 \\n,c = 
obtained by taking f = g = (l,i) in (1E.17[) .) 

An interesting version is to allow / and g to take values in the unit ball 
of an arbitrary Hilbert space H and define 



|iy||n,H := sup 

f,g:Q->H 

ll/IUII<?Hoo<l 



W(x,y)(f(x),g(y))dn(x)dn(y) . (E.19) 

n 2 



(Since we only consider real W, it is easy to see that it does not matter 
whether we allow real or complex Hilbert spaces in (|E.19|) .) In this case, the 
equivalence with HWHrj^ is a form of the famous Grothendieck's inequality 



311 ] . which says that 

\\W\\ a ,2 < \\W\\ n ,H < K G \\W\\ a ,2, (E.20) 

where the constant Kq, the real Grothendieck constant, is known to satisfy 
vr/2 <K G < vr/2(log(l + v / 2)) » 1.78221 (The lower bound is improved 
in an unpublished manuscript [55(.) We also have ||W^||n,c < ||W||n,H — 
Kq\\W\\u,<Ci where Kq is the complex Grothendieck constant, known to 
satisfy 4/ir < Kq < 1.40491 [HI]. Moreover, [|W[|n,c is obtained by taking 
only a fixed Hilbert space of dimension 2 in (|E.19j) . 

See [H for an algorithmic use of the version || • ||n,H of the cut norm and 
Grothendieck's inequality. 

E.3. Other operator norms. If W is a kernel on 0, then it defines an 
integral operator T\y '■ f •->■ Jq W(x, y)f(y) dp,(y) (for suitable /). We have 
already noted in Remark 14.21 that || • ||n,2 is the operator norm of Tjy as an 
operator L°°(ft) -)• L 1 ^), but we may also consider other spaces. 

Let, for 1 < p, q < oo, ||T|| Pi9 denote the norm of T as an operator 

Lemma E.6. // |W| < 1, then for allp,q G [l,oo], 

ll^lb.2 = lir^iu,! < \\T w \\ p , q < V2\\w\\^ 1/p ' 1/q \ 

Consequently, for any fixed p > 1 and q < oo, if W\, W 2 , ■ ■ ■ and W are 
graphons defined on the same space f2, then \\W n — W\\u — > if and only if 
\\Tw„ - Tw\\ P ,q ->■ 0. 

Proof. We know that ||W||n,2 = ||TV||oo,i- Moreover, for any probability 
space, the inclusions L°° C LP and L p C L 1 have norm 1, and thus [|T||oo l < 
||T|Lq for any operator T. 

Let 9 := min(l — 1/p, l/q), so 1 — 6 := max(l/p, 1 — l/q), and define 
p Q ,q G [lj oo] by 1/p = (1 - 0)/po and 1- 1/g = (1- 0)(1 - l/q ). Further, 
let pi = ooandgi = 1. Then (1/p, l/q) = (l-0)(l/p o , l/g O )+0(l/pi, 1/tfl), 
and it follows from the Riesz-Thorin interpolation theorem (see e.g. [3, The- 
orem 1.1.1]) that, provided we work with complex LP spaces, 



\\Tw\U<\\Tw\\l- e qo \\Tw\\ 



Pi, 91" 



64 SVANTE JANSON 

By (|ET7|) and ([K18]) . 

ll?VIU l91 = H^vKoo.i = \\w\\ n ,c < v^||w|| D)2j 

and the assumption \W\ < 1 implies ||TV|| P0)90 < ||TV||i i00 < ||W||oo < 1- 
The result follows. □ 

We consider the case p = q = 2 further, i.e., we regard Ty/ as an operator 
on the Hilbert space L 2 (Q). IfW is bounded (or, more generally, in L 2 (Q 2 )), 
then Tyy is bounded on L 2 ; it is further compact (and Hilbert-Schmidt) 
and selfadjoint (because W is symmetric). Hence T\y has a sequence of 
eigenvalues (A n ). We define, for 1 < p < oo, the Schatten S^-norm of T\y to 
be 

ll^|| 5p :=||(A n )||^ = (^|A n r) 1/P . (E.21) 



(See e.g. [30|, where also the non-selfadjoint case is treated.) It is well-known 
that for p = 2, || • ||s 2 equals the Hilbert-Schmidt norm and thus 

\\T W \\s 2 = \\W\\ L2m . (E.22) 

If p = 2k is an even integer > 4, then (1E.21I) yields 

W T w\\f 2k =J2 X n =Tr ( T w) =t(C 2k ,W), (E.23) 
n 

where the graph C 2 k is the cycle of length 2k. 

Lemma E.7. 

(i) For 2 < p < oo, if \W\ < I, then 

\\W\\nfl = \\T w \\oo,i < \\Twh,2 < \\T w \\s P < V2\\W\\H 2 ~ 1/p . 

Consequently, for any fixed p > 2, if W\, W 2 , ■ ■ ■ and W are graphons 
defined on the same space £1, then \\Tyy n — T\y\\s p —> if and only if 

\\W n -W\\n->0. 

(ii) Forp = 2, if \W\ < I, then 

\\w\\» < \\T W \\ S2 = \\w\\ L 2 < www 1 /, 2 . 

Consequently, if Wi,W 2 , . . . and W are graphons defined on the 
same space Q, then \\Tw n — Tw\\s a — > if and only if \\W n — W\\ L i — > 
0. 

Proof, (i): The first inequality is in Lemma lE.61 and the second is trivial, 
since the operator norm ||Tw||2,2 = sup ra |A n |. Further, by this and (|E.2ip . 

\\Tw\\ P Sp = J>„P> < ^|A n | 2 sup|A n r 2 = \\T W \\ 2 2 \\T W \\ P 2 ~ 2 . (E.24) 

n n 

We have \\T w \\s 2 = \\W\\& m < 1 by jEH, and ||r w || 2)2 < v^ll^H^ by 
Lemma lE.61 and the result follows. 

(ii): Immediate by (|E.22|) and standard inequalities (e.g. Holder). □ 
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In particular, by (i) with p = 4 and (fK23|) . if \W\ < 1, then ||W||rj )2 < 
tiCW) 1 / 4 < V2\\W\\^ 2 , or 

lt(C 4 ,W) < \\W\\ n ,2 < t{C 4 ,W) 1/4 . 

This was proved in [lil, Lemma 7.1] (by a slightly diferent argument, using 
a version of (|C.3|) ). where also an application is given. 

Remark E.8. There is no corresponding result for p < 2. In fact, ||Tw||s_ 
may be infinite for a graphon W. To see this, let first W be constant 1/2 
on [0, 1] and let (G n ) be a quasirandom sequence of graphs with G n — > W. 



Let W n := Wc n , so 5o(W n , W) — > 0. By [14j, Lemma 5.3], we may label the 
graphs G n such that \\W n — W\\u — > 0. 

By (lE^2l) . ||T w „_ w ||s 2 = ||Wn-W|| i2 = 1/2. On the other hand, arguing 
as in (1R241) . 

||3V n _ w ||| a < \\T Wn . w \\ p Sp \\T Wn . w \\ 2 2 - p < V2\\T Wn „ w f Sp \\W n -W\\ 2 n -£. 

Since the left-hand side is constant and the last factor tends to 0, it follows 
that ||?V n _v^||s — > oo. Further, \\W n — W\\oo < L It is now an easy 
consequence of the closed graph theorem that there exist bounded functions 
W on [0, l] 2 such that ||TV||s p = oo, and by linearity there must exist such a 
graphon. (An explicit W is given by a well-known analytic construction (30I . 
§111.10.3, p. 118]: let W(x,y) = f{x — y) on [0, l] 2 , where / is a continuous 
even function with period 1 on R such that T"l I f (n)\ p = oo for all p < 2; 
such a function was constructed by Carleman jl6l|. see also [U V.4.9].) 

Appendix F. The weak topology on W(O) 

Consider the space W = W(f2) of graphons on a fixed probability space 
f2. We have discussed two different metrics on this space, given by the norms 

H^i and || ||n; these give two different topologies on W(£l). 

Another topology on W(S1) is the weak topology a, regarding W(fi) as a 
subset of L l (VL 2 ). This topology is generated by the functionals Xh '■ W 
f n2 hW for h E L°°(il 2 ), in the standard sense that it is the weakest topology 
that makes all these maps continuous. Actually, since the functions in 
are uniformly bounded, we obtain the same topology from many different 
families of such functionals. 

We state this also for subsets of L 1 (il) and writing Xh(f) '■= Jn^f f° r 
h G L°°(f2) and / G L 1 ^). Thus the weak topology on L 1 (il) (or a subset of 
it) is the topology generated by Xh-, h G L°°(Q). Recall further that a subset 
% of a topological vector space is total if the set of linear combinations of 
elements of T~L is dense in the space. 

Lemma F.l. (i) Let H be a total set in L l (&), and let X be a subset of 
L 1 (J7,/i) consisting of uniformly bounded functions: supj e ^ ||/||oo < °o- 
Then the functionals {xh '■ h G 71} generate the weak topology on X . 

(ii) Let % be a total set in L 1 (J7 2 ). Then the functionals {xh '■ h G Ti} 
generate the weak topology on W(f2). 
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Proof, (i): Let t-h be the topology on X generated by {xh '■ h G H}, and let 
H' be the set of all g G L 1 (r2) such that Xg ls continuous (X, t%) — > R. By 
the definition of r^, H C further, and % generate the same topology, 
i.e., t h = t w . 

H' is clearly a linear subspace of L l (Q), and since we have assumed that 
7-L is total, H.' is dense in If g G there thus exists a sequence 

g n G with ||<7 n — g\\Li —> 0. Since the functions in X are uniformly 
bounded, this means that Xg„ ~^ Xg uniformly on X, and thus Xg too is 
r^-continuous; hence g G %' . Consequently, %' = and thus r% = 

t%i = T L i(ny Thus every total H C L 1 (0) generates the same topology. 
One such T~L is L°°(J7) which defines the weak topology (by definition). 

(ii): This is a special case, since Q 2 is another probability space. □ 

In particular, the weak topology on W(f2) is also the topology generated 
by the functionals W \-t J n2 hW, h G L 1 (J7 2 ), i.e., it equals the weak* 
topology on W(fi), regarded ciS ct subset of L°°(0 2 ). 

Remark F.2. Another example of a total set in L 1 (0 2 ) is the set of rectan- 
gle indicators 1s(x)1t(u) for S, T C 0. Thus the weak topology is also gen- 
erated by the functionals W h-> J* 5xr W. Note that the metric given by || ||n 
uses the same functionals, but with an important difference: || W n — W\\u — > 
if and only if J SxT W n — > f SxT W uniformly for all S, T C f2, while W n — >• W 
in the weak topology if and only if each L T W„ — > JsxT^, without 
any uniformity requirement. (Similarly, \\W n — W\\li — > if and only if 
J hW n — > J hW uniformly for all h with ||/i||oo < 1-) 

Lemma F.3. T/ie weak topology is weaker than the cut norm topology. I.e., 
the identity maps (W, || ||^i) — > (W, || ||n) — > (W,cr) are continuous. 

Proof. Immediate by Remark IF. 21 □ 

Theorem F.4. The topological space (W(fi),cr) is compact. 

Proof. W is a weak* closed subset of the unit ball of L°°(f} 2 ) = L 1 ^ 2 )*, so 
this follows from the Banach-Alaoglu theorem. □ 

One advantage with the weak topology is thus that it is compact, in 
contrast to the topologies defined by the norms || ||n and || ||^i which are not 
compact (in general, e.g. if Q = [0, 1]), see Example IF . 61 below . (Recall that, 
nevertheless, the quotient space (W, 6\j) is compact, and that this is a very 
important property.) 

However, a serious drawback with the weak topology is that the quotient 
map W(fi) — > W is not continuous in the weak topology. Equivalently, the 
homomorphism densities t(F, W) defined in Appendix O are not continu- 
ous in the weak topology (for every fixed F). More precisely, for example 
W h-> t(Ks,W) is not continuous in the weak topology on W([0, 1]), see 
Example El 
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Remark F.5. There are graphs F such that W i— )■ t(F, W) is weakly con- 
tinuous (i.e., continuous for a), for example K2 since t(K2, W) = Jq 2 W. We 
show in Lemma lF.71 below that Ki is essentially the only such exceptional 
case. 

Example F.6. Take Q = [0, 1]. Let g n (x) = sgn(sin(27mx)) and W n (x,y) = 
\ — \gn{x)g n {y)- Then g n (x) 6 {±1} and W n is {0, l}-valued; in fact, W n 
equals Wj^ fo r a complete bipartite graph K n ^ n . (A less combinatorial 
alternative is to take g n {x) = sin(27rnx).) 

We have g n = gf n and W n = W^ n , where (p n (x) = nx mod 1 as in 
Example 18.21 Consequently, W n = W\ , and thus W n = W\ in the quotient 
space VV, i.e. fa(W n ,Wi) = 0; in particular, W n — > W\ in (VV,fa). 

On the other hand, for any h G L 1 ([0, l] 2 ), f [0 1]2 h(x,y)g n {x)g n {y) -> 0, 

and thus W„ -»■ | in (W([0, l]),cr). 



If the quotient map W([0, 1]) — > W were continuous for cr, then W, 




in W, and since we already know W n — > Wi in W, we would have W\ = | 
in W, i.e., Wi = 5, which contradicts e.g. Corollary 18.121 Consequently, the 
quotient map is no£ continuous (W([0, l]),cr) — > (W, fc). 

This also shows that (W([0, 1]), || ||n) and, a fortiori, (W([0, 1]), || || L i) are 
not compact. Indeed, if one of these spaces were compact, then W n would 
have a convergent subsequence in it, and thus in (W, || ||n), with a limit 

W say. Since both maps (W, [| || n ) ->■ (W,cr) and (W, || || n ) -)■ (VV,fa) 
are continuous, the subsequence would converge to W in both (W, <r) and 
(W,<5 n ) too; hence both W = \ a.e. and = Wi, so again W\ = |, a 
contradiction. 

Furthermore, with W = 5, so W n — > W weakly, t(K^,W n ) = 0, while 
t(Ks, W) = I > 0; hence, t(K^, W) is not weakly continuous. 

Lemma F.7. The map W 1— > t(F, W) is weakly continuous (for 0, = [0, 1], 
say) if and only if F is a disjoint union of isolated vertices and edges. 

Proof. Let F have m vertices and e edges. If every component of F is a 
vertex or an edge, then t(F, W) = (J^ 2 W) e , which is weakly continuous. 

Conversely, suppose that F is a graph such that W h-> t(F, W) is weakly 
continuous. Let a G (0, 1/2) be rational and let W n := Wq^, where G n is 
the complete bipartite graph K an ^ n - an (for n such that an is an integer). 
Taking the vertices of G n in suitable (e.g. random) order, we have W n — > W 
weakly, where W = 2a(l — a) is a constant graphon. Thus, by assumption, 
t(F,W n )^t(F,W). 

If F is not bipartite, then t(F, W n ) = t(F, G n ) = 0, while t(F, W) > 0, a 
contradiction. 

If F is bipartite, suppose first that F is connected, so e > m — 1 edges. 
Then F has a bipartition where the smallest part has k < m/2 vertices, and 
thus 

t(F, W n ) = t(F, G n ) >a k (l- a) m - k > 2- m a m/2 , (F.l) 
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while 

t(F,W) = (2a(l-a)) e <2 e a m -\ (F.2) 

If m > 3, then m/2 < m — 1, and thus we can choose a so small that 
t(F, W n ) > 2t(F, W) for all n, a contradiction. Hence m < 2. 

If is bipartite and disconnected, we use the same argument for every 
component of F, noting that t(F, W n ) = t(F, W) if F has at most two 
vertices. It follows that no component of F can have more than two vertices. 

□ 

See Chatterjee and Varadhan [17[ for a recent application of the weak 
topology on W. 



Appendix G. Separability in Lebesgue spaces 

In many cases, the Banach space L 1 (fi, J-, fi) is separable. For example, 
this is the case if = [0, 1] with any Borel measure [i. (One example of a 
countable dense set is the set of polynomials with rational coefficients; this 



is dense e.g. by the monotone class theorem [371 . Theorem A.l].) Hence, 
by Theorem IA. 41 L 1 (0, J 7 , \i) is separable for every Borel probability space 
(£l,J-,fi). This includes almost all examples used in graph limit theory. 

However, there are cases when L 1 (f2,J r , /x) is non-separable. For exam- 
ple, this is the case when (f2,/x) is an uncountable product ([0,1], v) R or 
({0,1}, with v the uniform distribution, say. (Any uncountable prod- 
uct of non-trivial spaces will do.) In this case there are some technical 
difficulties and we sometimes have to be more careful. 

Recall that the elements / of L 1 (Q, J 7 , \x) formally are equivalence classes 
of functions, so to define pointwise values f(x) we have to make a choice of 
representative of /. This is usually harmless, but it may be a serious problem 
if we want to define f(x) for many / simultaneously, in particular if we want 
to define a measurable evaluation map (/, x) \-t f(x) on L 1 (J7,/x) X O — > R. 

The following lemma shows that this is possible when L 1 (J7, /j.) is separa- 
ble, and more generally on A x Q when A C L 1 (fi, /j,) is a separable subspace. 
Note, however, that there is no such measurable evaluation map in general, 
without separability assumption, see Example lG.2l below. This justifies stat- 
ing and proving the lemma carefully, although it may look obvious. 

Lemma G.l. If A is a closed separable subspace of L 1 (fi, J 7 , fi), then there 
is a measurable function <3? : A x Q — > R such that for every f G A, $(/, x) = 
f(x) for a.e. x G fL 

Proof. There exists a countable dense set D C A. Each element of D is an 
element of L 1 (J7, J-, //), i.e., an equivalence class of measurable functions on 
£1; we fix one representative for each element of D and regard the elements 
of D as these fixed functions. Write D = {di,d,2, • • • } with some arbitrary 
ordering of the elements. 
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Since D is dense in A, we may recursively define maps Hi : A —> D such 
that 

k 

f-J^Hiif) < 2 ~ fc > k>l, (G.l) 



by defining -£/&(/) as the first element of D that satisfies (|G.1|) , Then each 
Hi : A -> D is measurable. Further, (fGTTT) implies || H { (f)\\ L i < 3 • 2 _ * 
for i > 2, so / n ££il#i(/)|dA* = E"ill^(/)IUi < oo for every / G 
A, which implies that converges absolutely a.e. Moreover, 

(jG.ip implies by dominated convergence ||/ — YltLi Hi(f){x)\\ L \ = 0> so 
Hi{f){x) = f a.e. We now define 



<D(/,x) 



SSi Hi{f){x), if the sum converges; 
otherwise. 



Each map (/, x) i— > Hi(f)(x) is measurable, and thus $ is measurable. □ 

Example G.2. Let fio be the two-point set {0,1}, with uniform measure 
Mo{0} = ^o{l} = 1/2, and let be the uncountable product (Qo,/j,q) r . 

Any measurable function <1> : L 1 (f2,/i) x ft — > M depends only on countably 
many coordinates in /x) x = /u) x J7q , i.e., there is a countable 

set C C M such that if x = (x r ) rg R and y = (y r ) r eK are elements of = JIq 
with x r = y r for r ^ C, then 

= *(/,y) for all / G L 1 (il, fi). (G.2) 

Fix s ^ C and define <r : — > 0, by <7 : (x r ) r h-> (x£.) r with x[, = x r 
for r 7^ s and x' s = 1 — x s ; note that cr is measure-preserving. By (|Q.2|) . 
$(/, cr(x)) = $(/, x) for every / and x G fi. If $(/, x) = /(x) for a.e. x, 
then thus /(x) = f(a(x)) for a.e. x, which obviously is incorrect for the 
coordinate function /(x) = x s . 

Consequently, there exists no measurable evaluation map <3? : L 1 (J7,/x) x 
U — > K such that $(/, x) = /(x) for every / and a.e. x. 

In fact, it can be shown (again using the monotone class theorem) that 
if A is any measurable space and : A x — > M. is measurable and such 
that x i y <3?(a,x) G J 7 , for every a£i, then these function all lie 

in some separable subspace of L l (Q.,F, n). This shows that the condition 
in Lemma IG. II that A be separable is both necessary and sufficient for the 
conclusion of the lemma. 
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